Fast Foundation Stereo#

Fast Foundation Stereo (FFS) is a real-time stereo depth estimation model introduced in “Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching” (Wen et al., NVIDIA, CVPR 2026). FFS distills the full FoundationStereo architecture into a compact model that delivers over 10× faster inference while closely matching the zero-shot accuracy of FoundationStereo across diverse domains including robotics, autonomous vehicles, and industrial inspection.

TAO Toolkit integrates FFS into the depth_net module; select it by setting model.model_type: FastFoundationStereo in your experiment specification file. The model accepts rectified stereo image pairs and produces disparity maps.

The Fast Foundation Stereo model in TAO supports the following tasks:

train
evaluate
inference
export
gen_trt_engine

Architecture#

Fast Foundation Stereo applies a divide-and-conquer strategy to accelerate FoundationStereo across its three stages:

Feature extraction: Hybrid monocular and stereo priors from FoundationStereo are distilled into a single efficient student backbone.
Cost filtering: Blockwise neural architecture search automatically discovers the optimal cost filtering design under a latency budget.
Disparity refinement: A dependency graph models the recurrent structure of the GRU module, enabling structured pruning to eliminate redundancy.

The bp2 commercial checkpoint has a significantly smaller model configuration than FoundationStereo — refer to the configuration note in Creating a Configuration File for the exact parameter values required.

Data Input for Fast Foundation Stereo#

Annotation File Format#

Fast Foundation Stereo reads stereo data from a plain text annotation file. Each line specifies one stereo sample with fields separated by spaces:

Columns	Format	Use
2	`<left> <right>`	Inference without ground truth
3	`<left> <right> <disparity>`	Training and evaluation
4	`<left> <right> <disparity> <occlusion>`	Evaluation with occlusion mask

Note

The 4-column format is only supported when dataset_name is Middlebury or Eth3d. Other dataset types (including GenericDataset) will raise an error if given a 4-column annotation file.

Supported Datasets#

Set dataset.dataset_name: StereoDataset at the top level of your specification file. For each entry in a data_sources list, set dataset_name to one of the following values:

`dataset_name`	Description
`Middlebury`	Middlebury stereo benchmark
`Kitti`	KITTI autonomous driving stereo dataset
`Eth3d`	ETH3D low-resolution outdoor stereo dataset
`FSD`	NVIDIA Foundation Stereo Dataset (synthetic)
`IsaacRealDataset`	NVIDIA Isaac real-world stereo data
`Crestereo`	CREStereo large-scale synthetic dataset
`GenericDataset`	Custom stereo data; required for 2-column inference

For details on stereo rectification requirements and data preparation, refer to Stereo Depth Estimation.

Creating a Configuration File#

The experiment specification file is a YAML configuration that defines all parameters for training, evaluation, and inference. The following example shows the configuration for the bp2 commercial checkpoint:

results_dir: /data/result

dataset:
  dataset_name: StereoDataset
  max_disparity: 192
  min_depth: 0.0
  train_dataset:
    data_sources:
      - dataset_name: Middlebury
        data_file: /data/datasets/stereo/train.txt
    batch_size: 1
    workers: 4
    augmentation:
      crop_size: [320, 736]
  val_dataset:
    data_sources:
      - dataset_name: Middlebury
        data_file: /data/datasets/stereo/val.txt
    batch_size: 1
    workers: 4
    augmentation:
      crop_size: [320, 736]

model:
  model_type: FastFoundationStereo
  encoder: vitl
  hidden_dims: [128]
  n_gru_layers: 1
  corr_radius: 4
  corr_levels: 2
  n_downsample: 2
  max_disparity: 192
  valid_iters: 8
  train_iters: 22
  volume_dim: 28
  mixed_precision: false
  gwc_feature_normalize: true
  motion_encoder_widths: [56, 96, 16, 12]
  motion_encoder_final: 48
  gru_hidden: 60
  gru_gating_conv_widths: [100, 168]
  disp_head_input_dim: 60
  disp_head_intermediate: 36
  disp_head_pwconv1_widths: [212, 244]
  mask_widths: [32, 16]
  stem_2_widths: [12, 16]
  spx_2_gru_widths: [16, 12, 16, 24]
  spx_gru_out: 9
  classifier_mid: 14
  cnet_conv04_widths: [60, 48]
  cam_mid_channels: 8
  cost_agg_conv_patch_padding: [0, 0, 0]
  stereo_backbone:
    edgenext_pretrained_path: ""
    depth_anything_v2_pretrained_path: ""
    use_bn: false
    use_clstoken: false

train:
  num_gpus: 1
  num_epochs: 1
  precision: fp32
  pretrained_model_path: /data/checkpoints/model.pth
  optim:
    optimizer: AdamW
    lr: 1.0e-5

evaluate:
  num_gpus: 1
  batch_size: 1
  checkpoint: /data/checkpoints/model.pth

inference:
  save_raw_pfm: true
  num_gpus: 1
  checkpoint: /data/checkpoints/model.pth

export:
  checkpoint: /data/checkpoints/model.pth
  onnx_file: /data/checkpoints/model.onnx
  input_height: 480
  input_width: 736
  opset_version: 17
  batch_size: 1
  valid_iters: 8
  format: onnx

gen_trt_engine:
  onnx_file: /data/checkpoints/model.onnx
  trt_engine: /data/checkpoints/model.engine
  batch_size: 1
  tensorrt:
    data_type: fp16
    workspace_size: 4096
    min_batch_size: 1
    opt_batch_size: 1
    max_batch_size: 1

Note

The following model parameters must match the bp2 commercial checkpoint exactly. TAO applies schema defaults when a field is absent — the schema defaults differ from the bp2 training values and produce incorrect output without raising an error.

max_disparity: 192 — the schema default is 416. An incorrect value builds an oversized cost volume and shifts predictions out of the trained disparity regime.
gwc_feature_normalize: true — the bp2 checkpoint requires normalized group-wise correlation. Setting this to false produces negative disparity values in approximately 7–8% of pixels.
volume_dim: 28 — the schema default is 32.
hidden_dims: [128] and n_gru_layers: 1 — the FoundationStereo schema defaults are [128, 128, 128] and 3 respectively.

Key Configuration Parameters#

The following tables describe all available configuration parameters. The top-level results_dir field sets the output directory for all tasks. WandB logging is available and configured via the top-level wandb block; see the ExperimentConfig and WandBConfig fields in depthnet_tables.rst for details.

Dataset Configuration#

Field	value_type	description	default_value	valid_min	valid_max	valid_options	automl_enabled
`dataset_name`	categorical	Dataset name	StereoDataset			MonoDataset,StereoDataset
`normalize_depth`	bool	Whether to normalize depth	FALSE
`max_depth`	float	Maximum depth in meters in MetricDepthAnythingV2		1.0	inf
`min_depth`	float	Minimum depth in meters in MetricDepthAnythingV2		0.0	inf
`max_disparity`	int	Maximum allowed disparity for which we compute losses during training	416	1	416
`baseline`	float	Baseline for stereo datasets	0.193001	0.0	inf
`focal_x`	float	Focal length along x-axis	1998.842	0.0	inf
`train_dataset`	collection	Configurable parameters to construct the train dataset for a DepthNet experiment					FALSE
`val_dataset`	collection	Configurable parameters to construct the val dataset for a DepthNet experiment					FALSE
`test_dataset`	collection	Configurable parameters to construct the test dataset for a DepthNet experiment					FALSE
`infer_dataset`	collection	Configurable parameters to construct the infer dataset for a DepthNet experiment					FALSE

Model Configuration#

Field	value_type	description	default_value	valid_min	valid_max	valid_options	automl_enabled
`model_type`	categorical	Network name	MetricDepthAnythingV2			FoundationStereo,MetricDepthAnything,RelativeDepthAnything
`mono_backbone`	collection	Network defined paths for Monocular DepthNet Backbone					FALSE
`stereo_backbone`	collection	Network defined paths for Edgenext and Depthanythingv2					FALSE
`hidden_dims`	list	Hidden dimensions	[128, 128, 128]				FALSE
`corr_radius`	int	Width of the correlation pyramid	4	1			TRUE
`cv_group`	int	cv group	8	1			TRUE
`train_iters`	int	Train iteration	22	1			TRUE
`valid_iters`	int	Validation iteration	22	1
`volume_dim`	int	Volume dimension	32	1			TRUE
`low_memory`	int	reduce memory usage	0	0	4
`mixed_precision`	bool	Whether to use mixed precision training	FALSE
`n_gru_layers`	int	Number of hidden GRU levels	3	1	3
`corr_levels`	int	Number of levels in the correlation pyramid	2	1	2
`n_downsample`	int	Resolution of the disparity field (1/2^K)	2	1	2
`encoder`	categorical	DepthAnythingV2 Encoder options	vitl			vits,vitl
`max_disparity`	int	Maximum disparity of the model used in the training of a stereo model	416

Note

Set model_type to FastFoundationStereo for this model. The shared table above lists only the values applicable to other model types; FastFoundationStereo is valid for this model but is not shown in the shared table.

Stereo Backbone Configuration#

Field	value_type	description	default_value
`depth_anything_v2_pretrained_path`	string	Path to load DepthAnythingv2 as an encoder for Stereo DepthNet (FoundationStereo)
`edgenext_pretrained_path`	string	Path to load edgenext encoder for Stereo DepthNet (FoundationStereo)
`use_bn`	bool	Whether to use batch normalization in DepthAnythingV2	FALSE
`use_clstoken`	bool	Whether to use class token	FALSE

Training Configuration#

Field	value_type	description	default_value	valid_min	valid_max	valid_options	automl_enabled
`num_gpus`	int	Number of GPUs to run the train job.	1	1
`gpu_ids`	list	List of GPU IDs to run the training on. The length of this list must be equal to the number of gpus in train.num_gpus.	[0]				FALSE
`num_nodes`	int	Number of nodes to run the training on. If > 1, then multi-node is enabled.	1	1
`seed`	int	Seed for the initializer in PyTorch. If < 0, disable fixed seed.	1234	-1	inf
`cudnn`	collection						FALSE
`num_epochs`	int	Number of epochs to run the training.	10	1	inf
`checkpoint_interval`	int	Interval (in epochs) at which a checkpoint is to be saved; helps resume training.	1	1
`checkpoint_interval_unit`	categorical	Unit of the checkpoint interval.	epoch			epoch,step
`validation_interval`	int	Interval (in epochs) at which a evaluation will be triggered on the validation dataset.	1	1
`resume_training_checkpoint_path`	string	Path to the checkpoint from which to resume training.
`results_dir`	string	Path to where all the assets generated from a task are stored.
`checkpoint_interval_steps`	int	Number of steps to save the checkpoint.
`pretrained_model_path`	string	Path to a pretrained DepthNet model from which to initialize the current training.
`clip_grad_norm`	float	Amount to clip the gradient by L2 Norm. A value of 0.0 specifies no clipping.	0.1
`dataloader_visualize`	bool	Whether to visualize the dataloader.	FALSE				TRUE
`vis_step_interval`	int	Visualization interval in step.	10				TRUE
`is_dry_run`	bool	Whether to run the trainer in Dry Run mode. This serves as a good means to validate the specification file and run a sanity check on the trainer without actually initializing and running the trainer.	FALSE
`optim`	collection	Hyperparameters to configure the optimizer.					FALSE
`precision`	categorical	Precision on which to run the training.	fp32			bf16,fp32,fp16
`distributed_strategy`	categorical	Multi-GPU training strategy. DDP (Distributed Data Parallel) and Fully Sharded DDP are supported.	ddp			ddp,fsdp
`activation_checkpoint`	bool	Whether train is to recompute in backward pass to save GPU memory (TRUE) or store activations (FALSE).	TRUE
`verbose`	bool	Whether to display verbose logs to console.	FALSE
`inference_tile`	bool	Whether to use tiled inference, particularly for transformers which expect fixed size of sequences.	FALSE
`tile_wtype`	string	Use tiled inference weight type.	gaussian
`tile_min_overlap`	list	Minimum overlap for tile.	[16, 16]				FALSE
`log_every_n_steps`	int	Interval steps of logging training results and running validation numbers within one epoch.	500

Optimizer Configuration#

Field	value_type	description	default_value	valid_min	valid_max	valid_options	automl_enabled
`optimizer`	categorical	Type of optimizer used to train the network	AdamW			AdamW,SGD
`monitor_name`	categorical	Metric value to be monitored for the `AutoReduce` Scheduler	val_loss			val_loss,train_loss
`lr`	float	Initial learning rate for training the model, excluding the backbone	0.0001				TRUE
`momentum`	float	Momentum for the AdamW optimizer	0.9				TRUE
`weight_decay`	float	Weight decay coefficient	0.0001				TRUE
`lr_scheduler`	categorical	Learning scheduler: MultiStepLR : Decrease the lr by lr_decay from lr_steps StepLR : Decrease the lr by lr_decay at every lr_step_size	MultiStepLR			MultiStep,StepLR,CustomMultiStepLRScheduler,LambdaLR,PolynomialLR,OneCycleLR,CosineAnnealingLR
`lr_steps`	list	Steps at which the learning rate must be decreased This is applicable only with the MultiStep LR	[1000]				FALSE
`lr_step_size`	int	Number of steps to decrease the learning rate in the StepLR	1000				TRUE
`lr_decay`	float	Decreasing factor for the learning rate scheduler	0.1				TRUE
`min_lr`	float	Minimum learning rate value for the learning rate scheduler	1e-07				TRUE
`warmup_steps`	int	Number of steps to perform linear learning rate” warm-up before engaging a learning rate scheduler	20	0	inf

Evaluation Configuration#

Field	value_type	description	default_value	valid_min	automl_enabled
`num_gpus`	int	Number of GPUs to run the evaluation job.	1	1
`gpu_ids`	list	List of GPU IDs to run the evaluation on. The length of this list must be equal to the number of `gpus in evaluate.num_gpus`.	[0]		FALSE
`num_nodes`	int	Number of nodes to run the evaluation on. If > 1, then multi-node is enabled.	1	1
`checkpoint`	string	Path to the checkpoint used for evaluation.	???
`trt_engine`	string	Path to the TensorRT engine to be used for evaluation. This only works with `tao-deploy`.
`results_dir`	string	Path to where all the assets generated from a task are stored.
`batch_size`	int	Batch size of the input Tensor. This is important if `batch_size` > 1 for large dataset.	-1	-1
`input_width`	int	Width of the input image tensor.	736	1
`input_height`	int	Height of the input image tensor.	320	1

Inference Configuration#

Field	value_type	description	default_value	valid_min	automl_enabled
`num_gpus`	int	Number of GPUs to run the inference job.	1	1
`gpu_ids`	list	List of GPU IDs to run the inference on. The length of this list must be equal to the number of gpus in `inference.num_gpus`.	[0]		FALSE
`num_nodes`	int	Number of nodes to run the inference on. If > 1, then multi-node is enabled.	1	1
`checkpoint`	string	Path to the checkpoint used for inference.	???
`trt_engine`	string	Path to the TensorRT engine to be used for inference. This only works with `tao-deploy`.
`results_dir`	string	Path to where all the assets generated from a task are stored.
`batch_size`	int	Batch size of the input Tensor. This is important if batch_size > 1 for a large dataset.	-1	-1
`conf_threshold`	float	Value of the confidence threshold to be used when filtering out the final list of boxes.	0.5
`input_width`	int	Width of the input image tensor.		1
`input_height`	int	Height of the input image tensor.		1
`save_raw_pfm`	bool	Whether to save the raw pfm output during inference.	FALSE

Export Configuration#

Field	value_type	description	default_value	valid_min	valid_options
`results_dir`	string	Path to where all the assets generated from a task are stored.
`gpu_id`	int	Index of the GPU to build the TensorRT engine.	0
`checkpoint`	string	Path to the checkpoint file to run export.	???
`onnx_file`	string	Path to the onnx model file.	???
`on_cpu`	bool	Whether to export CPU compatible model.	FALSE
`input_channel`	ordered_int	Number of channels in the input Tensor.	3	1	1,3
`input_width`	int	Width of the input image tensor.	960	32
`input_height`	int	Height of the input image tensor.	544	32
`opset_version`	int	Operator set version of the ONNX model used to generate TensorRT engine.	17	1
`batch_size`	int	Batch size of the input Tensor for the engine. A value of `-1` implies dynamic tensor shapes.	-1	-1
`verbose`	bool	Whether to enable verbose TensorRT logging.	FALSE
`format`	categorical	File format to export to.	onnx		onnx,xdl
`valid_iters`	int	Number of GRU iterations to export the model.	22	1

TensorRT Engine Configuration#

Field	value_type	description	default_value	valid_min	automl_enabled
`results_dir`	string	Path to where all the assets generated from a task are stored.
`gpu_id`	int	Index of the GPU to build the TensorRT engine.	0	0
`onnx_file`	string	Path to the ONNX model file.	???
`trt_engine`	string	Path to the TensorRT engine generated should be stored. This only works with `tao-deploy`.	???
`timing_cache`	string	Path to a TensorRT timing cache that speeds up engine generation. This will be created/read/updated.
`batch_size`	int	Batch size of the input tensor for the engine. A value of `-1` implies dynamic tensor shapes.	-1	-1
`verbose`	bool	Whether to enable verbose TensorRT logging.	FALSE
`tensorrt`	collection	Hyperparameters to configure the TensorRT Engine builder.			FALSE

Augmentation Configuration#

Field	value_type	description	default_value	valid_min	valid_max	automl_enabled
`input_mean`	list	Input mean for RGB frames	[0.485, 0.456, 0.406]			FALSE
`input_std`	list	Input standard deviation per pixel for RGB frames	[0.229, 0.224, 0.225]			FALSE
`crop_size`	list	Crop size for input RGB images [height, width]	[518, 518]			FALSE
`min_scale`	float	Minimum scale in data augmentation	-0.2	0.2	1
`max_scale`	float	Maximum scale in data augmentation	0.4	-0.2	1
`do_flip`	bool	Whether to perform flip in data augmentation	FALSE
`yjitter_prob`	float	Probability for y jitter	1.0	0.0	1.0	TRUE
`gamma`	list	Gamma range in data augmentation	[1, 1, 1, 1]			FALSE
`color_aug_prob`	float	Probability for asymmetric color augmentation	0.2	0.0	1.0	TRUE
`color_aug_brightness`	float	Color jitter brightness	0.4	0.0	1.0
`color_aug_contrast`	float	Color jitter contrast	0.4	0.0	1.0
`color_aug_saturation`	list	Color jitter saturation	[0.0, 1.4]			FALSE
`color_aug_hue_range`	list	Hue range in data augmentation	[-0.027777777777777776, 0.027777777777777776]			FALSE
`eraser_aug_prob`	float	Probability for eraser augmentation	0.5	0.0	1.0	TRUE
`spatial_aug_prob`	float	Probability for spatial augmentation	1.0	0.0	1.0	TRUE
`stretch_prob`	float	Probability for stretch augmentation	0.8	0.0	1.0	TRUE
`max_stretch`	float	Maximum stretch augmentation	0.2	0.0	1.0
`h_flip_prob`	float	Probability for horizontal flip augmentation	0.5	0.0	1.0	TRUE
`v_flip_prob`	float	Probability for vertical flip augmentation	0.5	0.0	1.0	TRUE
`hshift_prob`	float	Probability for horizontal shift augmentation	0.5	0.0	1.0	TRUE
`crop_min_valid_disp_ratio`	float	Probability for minimum crop valid disparity ratio	0.0	0.0	1.0	TRUE

Training the Model#

To start training, run the train task using your experiment specification file. TAO initializes the model from train.pretrained_model_path if provided. To resume an interrupted run, set train.resume_training_checkpoint_path to the checkpoint path; if left empty, TAO automatically resumes from the latest checkpoint in results_dir.

tao depth_net train -e /path/to/spec.yaml

Training Output#

The training process generates the following outputs under the results_dir directory:

train/dn_model_latest.pth: Latest model checkpoint
train/model_XXX_YYYYY.pth: Periodic checkpoints (zero-padded epoch and step)
train/events.out.tfevents.*: TensorBoard log files
train/status.json: Training status and metrics

You can monitor training progress using TensorBoard:

tensorboard --logdir=/path/to/results/train

Note

The checkpoint path for subsequent actions follows the pattern <results_dir>/train/dn_model_latest.pth.

Evaluating the Model#

To evaluate a PyTorch checkpoint, set evaluate.checkpoint in your specification file. To evaluate a TensorRT engine, set evaluate.trt_engine instead (requires tao-deploy).

tao depth_net evaluate -e /path/to/spec.yaml

Evaluation Metrics#

For stereo depth estimation, TAO computes the following metrics. Lower is better for all metrics.

End-point error (epe): Mean absolute difference between predicted and ground-truth disparity in pixels.
Bad pixel rates (bp1, bp2, bp3): Percentage of pixels with disparity error exceeding 1, 2, and 3 pixels respectively.
D1 outlier rate (d1): Percentage of pixels where the disparity error exceeds both 3 pixels and 5% of the ground-truth disparity.
Absolute relative error (abs_rel): Mean of |predicted - ground_truth| / ground_truth.
Squared relative error (sq_rel): Sum of squared disparity errors divided by the sum of absolute deviations of ground-truth disparity from its mean.
Root mean square error (rmse): Root mean square error of the disparity values.
RMSE log (rmse_log): Root mean square error computed in log space.

TAO displays these metrics in the console output. For PyTorch evaluation, metrics are also logged to results_dir/evaluate/status.json. For TRT evaluation (tao-deploy), metrics are saved to results_dir/trt_evaluate/results.json.

Running Inference#

To run PyTorch inference, set inference.checkpoint in your specification file. To run TensorRT inference, set inference.trt_engine instead (requires tao-deploy).

tao depth_net inference -e /path/to/spec.yaml

Inference Output#

The inference process generates:

Colorized disparity visualizations in PNG format (PyTorch): results_dir/inference/ with dataset-relative paths
Colorized disparity visualizations in PNG format (TRT, tao-deploy): results_dir/trt_inference/predicted_depth/
Raw disparity values in PFM format (PyTorch path only; enable with inference.save_raw_pfm: true)

You can convert disparity to metric depth using the following formula:

depth = (baseline * focal_x) / disparity

Exporting the Model#

The export task converts a trained PyTorch checkpoint to ONNX format. TAO always exports an fp32 ONNX file regardless of the model.mixed_precision setting; precision selection happens at the gen_trt_engine step.

Configure export.checkpoint, export.onnx_file, export.input_height, export.input_width, export.opset_version, export.batch_size, and export.valid_iters in your specification file.

To export a dynamic-shape ONNX that accepts variable input resolutions, set export.dynamic_hw: true (FFS only; not supported by other depth models). Input height and width must each be divisible by 32.

tao depth_net export -e /path/to/spec.yaml

Generating a TensorRT Engine#

The gen_trt_engine task converts an ONNX model into an NVIDIA^® TensorRT™ engine for optimized inference. Configure gen_trt_engine.onnx_file, gen_trt_engine.trt_engine, gen_trt_engine.gpu_id, gen_trt_engine.batch_size, and the gen_trt_engine.tensorrt block (data_type, workspace_size, min_batch_size, opt_batch_size, max_batch_size) in your specification file.

Note

Set gen_trt_engine.tensorrt.workspace_size to 4096 MB. Fast Foundation Stereo requires more workspace memory than the default value of 1024 MB.

tao depth_net gen_trt_engine -e /path/to/spec.yaml

For production deployment, use a static-shape fp16 engine. For variable input resolutions, build a dynamic-shape engine from an ONNX exported with dynamic_hw: true and supply an optimization profile with min_height, opt_height, max_height, min_width, opt_width, max_width (all divisible by 32). Dynamic-shape engines may show slightly larger disparity drift than static-shape engines.

Best Practices#

Training Recommendations#

Precision: Use train.precision: fp32 for fine-tuning. fp16 and bf16 are supported but may degrade accuracy.
Pretrained checkpoint: Initialize from the bp2 checkpoint via train.pretrained_model_path. Use a learning rate of 1e-5 with an AdamW optimizer.
Batch size: Use dataset.train_dataset.batch_size: 1 for variable-aspect datasets such as Middlebury, KITTI, and ETH3D.
Crop size: Match dataset.train_dataset.augmentation.crop_size to dataset.val_dataset.augmentation.crop_size for consistent evaluation during training.

Performance Optimization#

TensorRT: Use a static-shape fp16 TensorRT engine for production inference — lowest latency and smallest disparity drift versus the PyTorch baseline.
Inference iterations: The bp2 checkpoint was distilled for valid_iters: 8; increasing valid_iters beyond 8 does not improve quality. train_iters: 22 is separate and controls training-time supervision only.
Memory: Set model.low_memory: 1 to reduce peak GPU memory at a small throughput cost. Values above 1 have no additional effect.

Troubleshooting#

Common Issues#

Large disparity drift:

Verify that model.max_disparity: 192 is set explicitly. Refer to the configuration note above for the full list of bp2-critical parameters.

Negative disparity values:

Verify that model.gwc_feature_normalize: true is set. Refer to the configuration note above for details.

Model load error or shape mismatch:

Verify that all model configuration values match the sample configuration in this document. Mismatched values cause the model to build with an architecture that differs from what the bp2 checkpoint expects.

Additional Resources#

For more information about stereo depth estimation with FoundationStereo, refer to Stereo Depth Estimation.