## VPI - Vision Programming Interface

#### 2.4 Release

Pinhole Camera Model

The pinhole camera model describes a camera that projects scene 3D points into the image plane by means of a perspective transformation. It is described by:

\begin{align*} s \mathsf{p} &= \mathsf{K} [ \mathsf{R} | \mathsf{t} ] \mathsf{P} \end{align*}

or

\begin{align*} s \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} &= \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_1 \\ r_{21} & r_{22} & r_{23} & t_2 \\ r_{31} & r_{32} & r_{33} & t_3 \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \\ (x_d,y_d) &= L(\tilde{x},\tilde{y}) \end{align*}

where:

• $$(X,Y,Z)$$ are the coordinates of a 3D point in world space.
• $$(u,v)$$ are the coordinates (in pixels) of the projection of $$(X,Y,Z)$$ on the image plane.
• $$\mathsf{K}$$ is a 3x3 matrix of intrinsic camera parameters.
• $$[R|t]$$ is a 3x4 matrix of extrinsic camera parameters, mapping world space to camera space. It is composed of a 3D rotation followed by translation.
• $$(c_x,c_y)$$ is the camera's principal point in pixels, where its origin is projected on the image plane. Usually is at the image center.
• $$f_x,f_y$$ are the camera's horizontal and vertical focal lengths, respectively, expressed in pixel units.
• $$s$$ is a scale factor.