The pinhole camera model describes a camera that projects scene 3D points into the image plane by means of a perspective transformation. It is described by:

\begin{align*} s \mathsf{p} &= \mathsf{K} [ \mathsf{R} | \mathsf{t} ] \mathsf{P} \end{align*}

or

\begin{align*} s \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} &= \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_1 \\ r_{21} & r_{22} & r_{23} & t_2 \\ r_{31} & r_{32} & r_{33} & t_3 \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \\ (x_d,y_d) &= L(\tilde{x},\tilde{y}) \end{align*}

where:

\((X,Y,Z)\) are the coordinates of a 3D point in world space.
\((u,v)\) are the coordinates (in pixels) of the projection of \((X,Y,Z)\) on the image plane.
\(\mathsf{K}\) is a 3x3 matrix of intrinsic camera parameters.
\([R|t]\) is a 3x4 matrix of extrinsic camera parameters, mapping world space to camera space. It is composed of a 3D rotation followed by translation.
\((c_x,c_y)\) is the camera's principal point in pixels, where its origin is projected on the image plane. Usually is at the image center.
\(f_x,f_y\) are the camera's horizontal and vertical focal lengths, respectively, expressed in pixel units.
\(s\) is a scale factor.

Pinhole camera model

VPI - Vision Programming Interface

3.2 Release