Transformations (camera orientation)

Data location in a buffer#

The origin is always located in the upper left corner of the image.
Y axis is directed downward, X axis is directed to the right.
Width and Height — image dimensions along the X and Y axes. The X axis is always scanned by the "fast" index, i.e. the image is stored in memory in rows along the X axis.

Pixel index in memory for its coordinates is: index = width * y + x.

Camera orientation options#

Orientation — sets the counterclockwise image rotation. It has four possible values given in degrees: 0, 90, 180, 270.
requireMirroring — when set to true, the image will be flipped along the X axis (applied after any rotation) before the recognition stage and effect drawing. If the device's frontal camera produces a mirrored image, you should pass it into recognizer with requireMirroring = false.
PixelFormat — the camera data binary format. E.g. YUV_420 or RGBA.

Default transformation fits by Height#

The default transformation fits Input Frame into Output Surface by Height. It means the Input Frame will be displayed in the Result Frame padded (or cropped) by width.

Custom transformation fits input rectangle inside output rectangle#

The custom transformation fits input image_rect into the output surface (viewport_rect). The Input image_rect will be inscribed into the viewport_rect by width or height. The Output Surface may also be flipped by the X or Y axis with flip_x, flip_y parameters.

You can also set the custom transformation with set_render_transform function. To switch back to using the default transformation, call set_render_transform with zero-size rectangles.

set_render_transform(pixel_rect image_rect, pixel_rect viewport_rect, flip_x, flip_y);

Key	Value	Description
image_rect	x, y, w, h	x, y - input rectangle coordinates, w, h - input rectangle size
viewport_rect	x, y, w, h	x, y - output rectangle coordinates, w, h - output rectangle size
flip_x	true, false	true - flip image by x axis, false - default value
flip_y	true, false	true - flip image by y axis, false - default value

Technical details#

The transformation of coordinates from one image basis to another is an image processing task. Initially, Face Recognition Engine (FRX) detects the coordinates of facial features on the input image given by the camera. Then, these coordinates undergo a sequence of transformations to display visual effects on the screen.

During this process, the coordinates of the detected landmarks are converted to corresponding dimensions and stored as a part of the frame_data structure.

frame_data includes different data structures. Some of them transform coordinates to specific data structure dimensions. Such transformations are represented by Basis Transformations defined as a part of full_image, frx_y_plane or frx_result structures. The Basis Transformation transfers coordinates from common space (common basis) to a specific space. The image below shows the relations between common basis and destination basis.

For all face recognition coordinate spaces, the axis Y has a downward direction. The center of the coordinate space may be located at top left corner or in the center of basis dimensions.

SDK defines several coordinate spaces:

Common basis: An intermediate space to perform convertions between other spaces. This space has the next parameters:

The basis dimensions are defined by full_roi rectangle stored in full_image_t event data structure. The basis dimension depends on the size of the full image, e.g. if the full image is 1280x720, then the common basis space will be [-640:640; -360:360], and the center of the coordinate space is in (0,0);
Rotation (orientation): 0 degrees;
Axis Y downward directed;
Mirroring disabled.

frx_y_frame: The coordinate space that enables the face analysis engine to recognize facial features. The face analysis engine accepts frame in a resolution of 640x480.

If the scaled image (one or both dimensions) is smaller than the destination image, then the scaled image is centered within the image borders. The rest part is filled by copying the edges of the frame.

frx_result: The space used to store face recognition results for further processing.

This space is a combination of frx_y_frame space rotated by 90 degrees counterclockwise and moved by FRX_WIDTH (480) pixels to compensate rotation offset (the head on the image is directed upward).

full_image: The coordinate space of the full frame acquired from the camera.

Transformations between these spaces are always performed via a common basis. In other words, when you need to transform coordinates from frx_result space to full_image space, you need to multiply the inversed transformation of frx_result space (3x3 matrix) by the Basis Transformation of full_image space (also 3x3 matrix). Schematically it will look like: frx_result.basis_tranform.inverse() * full_image.basis_transform. Further, to perform transformation, you need to multiply coordinates in the source space with the destination transform matrix using a special method transform. See examples below for more details.

Please note that transform stores the transformation matrix in row-based order, so the sequence of transformations reads from left to the right.

Example: frx_result to full_frame

auto transform_to_full_frame = frx_rec_res->basis_transform.inverse() >> full_image.basis_transform;

note

operator >> denotes multiplication of the source transform matrix to the destination one.

It is read: Transform frx_result to a common basis and then transform it to the full_image space.

Example: frx_result to Open GL (OGL) space projection [-1:1; -1:1]

   auto proj_mat = glm::make_mat4(frx_rec_res->faces[0].camera_position.projection_m);
   bnb::pixel_rect target(-1, -1, 2, 2);
   int frx_w = bnb::constants::FRX_FRAME_W;
   int frx_h = bnb::constants::FRX_FRAME_H;
   // Conversion without taking borders into account
   auto gl_to_frx =
         bnb::transformation(target, bnb::pixel_rect{frx_w, frx_h},
                           bnb::transformation::rotate_t::deg_0, false, true);
   auto frx_to_gl =
       frx_rec_res->basis_transform.inverse() >>
       bnb::transformation(full_img->full_roi, target,
                           bnb::transformation::rotate_t::deg_0, false, true);

   auto mask_gl_to_gl = gl_to_frx >> frx_to_gl;
   // By default transformation is performed only for two coordinates `x` and `y`,
   // while GL operates by 3 coordinates - x, y, z, so extend transformation matrix here.
   // xy -> xyz
   auto glmat3 = glm::make_mat3(mask_gl_to_gl.transposed_data().data());
   glm::mat4 glmat4(glmat3);
   std::swap(glmat4[2], glmat4[3]);
   std::swap(glmat4[2][2], glmat4[2][3]);
   std::swap(glmat4[3][2], glmat4[3][3]);
   proj = glmat4 * proj_mat;

It is read: Transform OGL space dimensions to frx_result dimensions (no orientation fix) flipping axis y; transform the result to common basis taking into account the frx frame border size and orientation; then transform it to OGL space back without rotation, but flipping y axis; extend the matrix to support 3d space conversions.

Example: OGL space to frx_result space.

   pixel_rect target(-1, -1, 2, 2);
   int frx_w = bnb::constants::FRX_FRAME_W;
   int frx_h = bnb::constants::FRX_FRAME_H;
   auto frx_frame_to_gl = transformation(pixel_rect{frx_w, frx_h},
       target, transformation::rotate_t::deg_0, false, true);
   auto frx_to_input_frame =
       data.get_data<frx_y_plane>().basis_transform.inverse() >>
       frx_rec_res->basis_transform;
   auto frx_to_gl = frx_to_input_frame >> frx_frame_to_gl;
   auto gl_to_frx = frx_to_gl.inverse();

It is read: Transform frx_result to OGL space dimensions (no orientation changes), fix orientation via common space transforming to FRX input frame dimensions. Get transformation from frx_result space to OGL space and inverse this transformation to get the required OGL space to frx_result space.

Transform frx_result space to normalized basis [0:1; 0:1], the space origin is at left top corner.

   bnb::pixel_rect target(-1, -1, 2, 2);// <- due to pixel_rect specific,
                                        // min width and height should be greater or equal 2
                                        // (see code of bnb::transformation(...)
                                        // for details)
   auto frx_frame_to_ndc = bnb::transformation(full_image->full_roi,
     target, bnb::transformation::rotate_t::deg_0, false, false);
   auto frx_to_ndc = frx_rec_res->basis_transform.inverse() >> frx_frame_to_ndc >> 
     bnb::transformation(0.5, 0.5, 0.5, 0.5);// <- it is necessary to get final normalized
                                             // basis dimensions

It is read: Transform common basis to the OGL basis without flipping y axis; transform frx_result basis to common basis, transform the result to OGL space and then OGL space to normalized basis.