Camera Positioning Guide

Camera placement can have a large impact on overall accuracy and performance of Metropolis AI Workflows (Reference Applications) such as Multi-Camera Tracking or Occupancy Analytics.

We recommend review the following definitions and suggestions for new camera installations.

Definitions

Definitions
  • Pitch: the angle between the camera center line and the horizontal plain (the rotation around the side-to-side axis).

  • Height: The mounting height of the camera from the ground.

  • Roll: The camera rotation around the front-to-back axis.

  • H-FOV: Field of View, the extent of a camera’s range of observation.

  • DIS - Distance to center: The distance from the camera to the center of the area of detection.

Guideline

For Each Camera

  • Keep the camera’s Pitch between 45 and 15 degrees. The smaller the pitch is, the better the detection model can extract appearance features. The larger the pitch is, the more accurate the calibration can project detections on the plan map. Keeping the cameras’ Pitch within the recommended range helps on the overall performance of the reference applications.

  • Mount the camera 8 to 14ft above the ground. Mounting the camera within the recommended range makes it easier to cover the area of detection while keeping the camera’s Pitch within the recommended range.

  • Strive to keep the camera roll at 0 degrees. Do not mount sideways or upside-down. Larger roll angles will introduce more errors on estimating the location of detected people in the plan map.

  • The camera’s FOV (field of view) should be between 80 and 120 degrees.

  • Distance DIS should be no less than 12ft, and no more than 40ft. Ensure that the center of the area of interest is in the center of camera’s FOV.

  • The camera view should be linear. Any warping effect or distortion will introduce errors on estimating the location of detected people in the plan map.

  • There should be at least four reference points on the floor from the plan map identifiable in the camera view. The reference points are used to create projections from camera view onto the plan map and human errors in the selection of reference points will impact the accuracy of estimating the location of detected people in the plan map. Users need to be creative in selecting reference points and the reference points need to be as precise as possible. The reference points can be wall corners, a shelf foot, a corner of a pillar, markers (even temporary created markers), etc.

  • Each camera should look at a clear pathway, e.g., a lane between shelves in a retail store, to reduce occlusion as much as possible.

The above single camera guidelines serve for the following purposes:

  • The occlusion on people to be detected should be as minimal as possible. (Most important)

  • The people to be detected should be present with large enough size in the camera view. Preferably the size is > 5000 pixels (~70x70).

  • The people to be detected should be observed more in a side angle rather than a top-down angle so that better appearance features can be extracted.

  • The foot location on the floor of the people to be detected should be clearly visible (So that we can estimate the location in the plan map).

Across All Cameras

  • Cover as much of the walking area as possible. Strive to cover all floor spaces.

  • It is good to have some overlap between cameras’ FOV which will facilitate spatial-temporal association. It is recommended to reduce the traveling gaps between any two neighboring cameras.

  • Color and lighting should be consistent across cameras, otherwise they might impact the re-identification accuracy. Please address any color distortion and prevent strong lighting contrast.

The above multi-camera guidelines serve for this following purpose:

  • A person standing anywhere in the area to be detected should have his/her front side (from head to feet) being cleared visualized by at least one camera.

Note

These guidelines of multicamera setup are primarily written for theMetropolis Multi-Camera Tracking reference application.

Examples

Single Camera

Consider the following two views as a comparison:

  • View1: Camera mounted at around 5ft high with a Pitch angle around zero degree.

  • View2: Camera mounted at around 12ft high with a Pitch angle around 30 degrees.

example1

From the perspective of Metropolis reference applications, View2 is better than view1 for the following reasons:

  • In view1, there are more chances for occlusions between moving objects due to the low mounting height.

example2
  • In view1, there are more chances for bad detection due to people partially present outside of the camera view. In the provided sample image below, the detected bounding box does not cover the foot, and the location of the detected person will be completely off in the plan map.

example3
  • In view1, the location estimation in the plan map will have larger jitters than view2 due to its small Pitch angle.

example4

Multi-Camera

Assume there is a room with two tall shelves in the middle and consider the following two multi-camera setups as a comparison:

  • Setup 1: use two large FOV cameras, one from each side, to cover the room.

  • Setup 2: use four cameras, one from each corner, to cover the side spaces of the room and use two cameras to cover along the corridor between the two tall shelves.

example3 example3

From the perspective of Metropolis MTMC reference application, Setup 2 is better than Setup 1 for the following reasons:

  • In setup 1, The persons standing in the corridor between the two shelves are likely to be occluded in the camera view.

  • In setup 1, areas near corners of the room might not be well covered by camera FOV or they are close to the edge of camera FOV thus people standing in those areas are likely to be non-visible or truncated by camera FOV edges.