Row-major vs. column-major matrices

There are row-major and column-major matrices.

A translation matrix defined in column-major fashion looks like this:

$$ T(t_x,t_y,t_z) = \begin{pmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{pmatrix} $$

A translation matrix defined in row-major fashion looks like this:

$$ T(t_x,t_y,t_z) = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ t_x & t_y & t_z & 1 \end{pmatrix} $$

I stick here to column-major matrices. Make always sure if you are using column-major or row-major layout. Even more, confusion can be created when relying on a column-major layout, but storing the matrix in memory in row-major layout.

If you working with column-major matrices you consider your vectors to be column vectors, i.e.:

$$ T(t_x,t_y,t_z) \cdot v = \begin{pmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ w \end{pmatrix} = \begin{pmatrix} x + t_x \\ y + t_y \\ z + t_z \\ w \end{pmatrix} $$

If you work with row-major layout vectors are considered as row vectors:

$$ v \cdot T(t_x,t_y,t_z) = \begin{pmatrix} x & y & z & w \\ \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ t_x & t_y & t_z & 1 \end{pmatrix} = \begin{pmatrix} x + t_x & y + t_y & z + t_z & w \\ \end{pmatrix} $$

Homogeneous coordinates

Using 4x4 matrices has several advantages. For instance, it allows us to express translations using matrices. Using a 3x3 matrix it would not possible to describe a translation of a 3d point. Since a 4x4 matrix needs to be multiplied by a vector with 4 elements we simply add a fourth coordinate to our points, normals and vectors. This fourth component is called the $w$ coordinate. For points it is $1$, for vectors and normals it is $0$. This way only points are affected by translations, but not vectors and normals.

Please note that normals have always to be transformed by the inverse transpose.

Another benefit of using 4d points and vectors are projections.

Coordinte Systems

Before talking about perspective projection it makes sense to talk about coordinate systems. The following coordinate system is assumed in this article:

left-handed coordinate system

It is a left-handed coordinate system The x-axis goes from left to right, the y-axis from bottom to top and the z-axis goes from front to back. It is assumed that the virtual camera looks into the positive z-axis direction. You should always ask yourself when working in a 3D context how the underlying coordinate system works to prevent confusion.

Other render systems use sometimes other naming and orientation of the axis elements. This can lead to serious problems, such as mirrored 3D models. For instance, Wavefront .obj files use a right-handed coordinate system to specify the coordinate locations. A right-handed coordinate system is shown in the following image.

right-handed coordinate system

A left-handed coordinate system is used for example by Maxon’s Cinema4D (1), the famous PBRT ray tracer (2) and last but not least Intel’s Embree library.

A right-handed coordinate system is used for example by early OpenGL versions or by the educational Nori ray tracer.

A positive direction of rotation in a left-handed coordinate system is clockwise, whereas in a right-handed system a positive direction of rotation is counter-clockwise.

The virtual camera is usually defined in regards to a camera coordinate system. The idea here is that the eye-point is always at $(0,0,0)$ and the camera view axis is always along the z-axis.

Of course, it makes no sense to define objects in our scene in regards to the camera coordinate system. Usually, objects are defined in world space coordinate system and transformed before rendering to the camera coordinate system. This process was already explained in one of my last postings.

The goal of the perspective projection is to map the 3D scene to a 2D image. The 2D image is typically described as a 2D raster graphic (pixel graphic) that has its own coordinate system. The x-axis here goes from right to left and the y-axis form top to bottom. The following image shows a $5 \times 3$ raster image with the corresponding coordinates.

raster coordinate system

This coordinate system is typically used in raster images such as OpenEXR.

Usually during perspective projection camera coordinates are transformed to normalized device coordinates. Normalized device coordinates range on the x- and y-axis from $[-1,1]$. These coordinates must be mapped to the image.

Basic transformations

Translations

$$ T(t_x,t_y,t_z) \cdot v = \begin{pmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ w \end{pmatrix} = \begin{pmatrix} x + t_x \\ y + t_y \\ z + t_z \\ w \end{pmatrix} $$

Scaling

Uniform Scaling:

$$ S(s) \cdot v = \begin{pmatrix} s & 0 & 0 & 0 \\ 0 & s & 0 & 0 \\ 0 & 0 & s & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ w \end{pmatrix} = \begin{pmatrix} x \cdot s \\ y \cdot s \\ z \cdot s \\ w \end{pmatrix} $$

Non-Uniform Scaling:

$$ S(s_x, s_y, s_z) \cdot v = \begin{pmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ w \end{pmatrix} = \begin{pmatrix} x \cdot s_x \\ y \cdot s_y \\ z \cdot s_z \\ w \end{pmatrix} $$

Rotation

Rotation around x-axis:

$$ R_x(\alpha) = \begin{pmatrix} \cos{\alpha} & -\sin{\alpha} & 0 & 0 \\ \sin{\alpha} & \cos{\alpha} & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} $$

Rotation around y-axis:

$$ R_y(r) $$

Rotation around z-axis:

$$ R_z(r) $$