Non-linear Video Resizing

In the course CS576 Multimedia System Design, we are assigned to implement a video resampling program. That is not just linear resizing that scales each pixel proportionally though linear resizing is also a part of it. Our task is to build four resizing strategies:

  • linear-mapping resizing;
  • non-linear-remapping for peripheral areas;
  • content-aware remapping;
  • temporal content-aware remapping.

The non-linear remapping part took me quite a lot of time to find a good non-linear function, so in this blog post I’ll write about how I get to it.

What is non-linear video resizing?

If we have a video with 960*540 size (aspect ratio: 1.777:1) and hope to resize it, as long as our goal has the same aspect ratio, all the pixels would remain unstretched. However, when our width and height scaling factors differs, there is a pixel stretching effect, which is not desirable. One smart improvement is to make non-linear remapping according to where the pixels are on the screen.

The non-linear remapping we talk about here is controlled non-linearly such that the larger part of the central area of the video (which is the focus of attention) has the same original pixel aspect, and the peripheral areas are scaled to compensate. An example is shown as below.

An example of non-linear resizing

Find a non-linear remapping function

The key to make the conversion is to find a remapping function that remaps positions $(x’, y’)$ in our resized video to a position $(x, y)$ in the original video, so that pixel at $(x’, y’)$ gets the color from pixel $(x, y)$ in the original video frame. Apparently, in the linear resizing, the remapping function from $x’$ to $x$ (or from $y’$ to $y$) is a linear function (say, $x’=f(x)=kx$ where $k>0$), as is shown in the left graph below. For the non-linear resizing described previously, we need to find a function that somewhat remaps positions like the right graph below.

Linear and Non-Linear Remapping

Here’s why. Suppose we are to resize a video 0.8 of its original width and 0.5 of its original height. Since the width has a bigger scaling factor, the width will be remapped non-linearly with only pixels at the central width area having its original pixel aspect. Suppose we want $x_S$ width positions in a resized frame from $x_E$ width positions in an original frame. The gray range in the right graph shows the linearly stretched central width positions, while the peripheral parts beside the gray area are stretched more in a non-linear way so that the number of positions becomes $x_S$.

I would like to move the coordinate system a little to show the remapping better, as shown below.

Non-linear Remapping with Re-arranged Coordinate

To make this peripheral non-linear stretching visually better, the non-linear remapping function $x’=f(x)$ should follow three rules:

  1. The original position where non-linear remapping ends should be the end of the all original positions, i.e. $f\left(\frac{x_S}{2}\right)=\frac{x_E}{2}$.
  2. The original position where non-linear remapping starts should be the ending position of linear remapping, i.e. $f\left(\frac{x_T}{2}\right)=k\frac{x_T}{2}$, where $k=\frac{x_E}{x_R}$.
  3. The rate of change at the position where non-linear remapping starts should be the same as where linear remapping ends, i.e. $f’\left(\frac{x_T}{2}\right)=k$, where $k=\frac{x_E}{x_R}$.

Following these rules, we can basically choose any functions that looks like what shows in the above graph. However, this is not an easy process, since we need to calculate the exact function with $x_E$, $x_S$, $x_T$, $x_R$ given so that we can implement it in our resizing algorithm. Intuitively, I tried the following functions and finally got one solution.

Logarithmic function

A logarithmic function is the most intuitive function I think about from my audio/video editing experiences. We can have

and

Ideally, we could solve $a$, $b$, $c$ (more than one answers) and have this function as the non-linear part of our remapping function. However, it is so hard to solve this system of equations, even with the help of a computer. (Maybe I could have implemented an iterative method for solutions. Didn’t try. :D) That’s why I moved to quadratic function.

Quadratic function

A quadratic function can also look like what is shown in the graph. Similarly, having $f(x)=ax^2+bx+c \ (a \neq 0)$ and the three rules, we try to figure out $a$, $b$, $c$. It is not hard to solve $a$, $b$, $c$ this time — we will have

However, after implementing this and seeing a few undesirable results, I realized that the quadratic function we get is not always the one we want since $(\frac{x_S}{2}, \frac{x_E}{2})$ might be at the right of the vertex. In these cases, some $f(x)$ values exceeds $\frac{x_E}{2}$, the last position we have in the original frame, which makes remapping undoable. An ideal and an unwanted case are shown in the graph below. You can also play with an unwanted example using Desmos here.

Possible Quadratic Functions

Since there is only one solution set for $a$, $b$, $c$, an unwanted quadratic function would be a final one. With this failure, I went on to Bezier Curve.

Bézier Curve

I got to know Bézier curve before as a drawing tool at Adobe Illustrator or somewhere but honestly did not know much about it. Basically, it is a curve defined by some control points. A further look at it shows it can be written as parametric equations with known control points.

According to our task, a quadratic Bézier curve with three control points could be a good idea. The first and last control points of a Bézier curve are always the end points. Thus two of the control points must be $\mathbf{P}_0=(\frac{x_T}{2}, k\frac{x_T}{2})$ and $\mathbf{P}_2=(\frac{x_S}{2}, \frac{x_E}{2})$. We can find another control point $\mathbf{P}_1$ somewhere near $\mathbf{P}_0$ on $f(x)=kx$ to ensure the rate of change at $\mathbf{P}_0$ to be $k$ ($k=\frac{x_E}{x_R}$). Thus, we get the Bézier curve:

Experiments show that a $\mathbf{P}_1$ that is too far away from $\mathbf{P}_0$ can introduce a problem similar to the one using quadratic function — you can try change $\mathbf{P}_1$ here to see how the curve turns. I finally chose $\mathbf{P}_1$ to be $\left(\frac{T}{2}+\frac{\left(R-T\right)}{10},\ k\left(\frac{T}{2}+\frac{\left(R-T\right)}{10}\right)\right)$, as shows below.

We should write the Bézier curve in another $x’=f(x)$ form so that we can implement it — with a known $x$, we need to calculate the corresponding $t$ to get the remapping result $x’$.

Let $\mathbf {B}_x (t)=x$, we get

Let $a=\mathbf{P}_2.x-\mathbf{P}_0.x-2\mathbf{P}_1.x$, $b=2\mathbf{P}_1.x$ and $c=\mathbf{P}_0.x-x$. Thus, $t$ is one of the two solutions that satisfies $0 < t < 1$ in

With $t$, we calculate the remapping result $x’=\mathbf {B}_{x’} (t)$.

Knowing the Bézier curve as the non-linear part of the remapping function, together with the linear part $f(x)=kx$, we can finally remap any given position to a position in the original frame.

Implementation

The quadratic Bézier curve discussed above is a valid remapping function to implement. I created a class that does the remapping. It calculates $\mathbf{P}_0$, $\mathbf{P}_1$, and $\mathbf{P}_2$ when initialized, has a private remapping function equivalent to what we talked about that calculates the $f(x)$ for a given $x$, and has a public function that calls it and rearranges the coordinates.

class NonLinearStretcher {
    private double xE;
    private double xR;
    private double xS;
    private double xT;
    private double k;
    private double X_0;
    private double Y_0;
    private double X_1;
    private double Y_1;
    private double X_2;
    private double Y_2;

    public NonLinearStretcher(int originalLen, int resizedLen, int stretchedLen) {
        this.xE = originalLen;
        this.xR = resizedLen;
        this.xS = stretchedLen;
        // Use 0.4 of the resizedLen as the central part.
        this.xT = (int) (0.4 * resizedLen);
        this.k = xE * 1.0 / xR;
        this.X_0 = xT / 2;
        this.Y_0 = k * X_0;
        this.X_1 = xT / 2 + (xR - xT) / 10;
        this.Y_1 = k * X_1;
        this.X_2 = xS / 2;
        this.Y_2 = xE / 2;
    }

    private double bezier_x(double t) {
        return (1 - t * t) * X_0 + 2 * (1 - t) * t * X_1 + t * t * X_2;
    }

    private double bezier_fx(double t) {
        return (1 - t * t) * Y_0 + 2 * (1 - t) * t * Y_1 + t * t * Y_2;
    }

    private double nonLinearFunction(int x) {
        if (x <= (xT / 2.0)) return (k * x);
        else if (x == xS / 2) return xE / 2;
        else {
            // Using Bezier Curve for the non-linear part.
            // First, calculate t from x.
            double a = X_2 - X_0 - 2 * X_1;
            double b = 2 * X_1;
            double c = X_0 - x;
            double t = (-b + Math.sqrt(b * b - 4 * a * c)) / (2 * a);
            if (t <= 0 || t >= 1) t = (-b - Math.sqrt(b * b - 4 * a * c)) / (2 * a);
            // Then, calculate fx from t.
            double fx = bezier_fx(t);
            return fx;
        }
    }

    public int getPosBeforeStretched(int pos) {
        int t = pos - (int) (xS / 2.0);
        double tNew = nonLinearFunction(Math.abs(t));
        if (t < 0) tNew = -tNew;
        int posBeforeStretched = (int) (tNew + (xE / 2.0));
        return posBeforeStretched;
    }
}

Resizing a frame with bigger width scaling factor than height scaling factor, we stretch all the width positions x using x' = getPosBeforeStretched(int x) and keep all the height positions y linearly y' = y / Math.min(widthScaling, heightScaling). Filling each (x, y) with the color of (x', y') from the original frame finally makes a resized frame. All the resized frames constitute a resized video.

Example

If we resize a 960*540 video to 0.8 of its width and 0.5 of its height, then the resized video should be 768*270. The non-linear remapping for the horizontal pixel positions should be:

0 -> 0
1 -> 0
2 -> 0
3 -> 0
4 -> 0
5 -> 1
6 -> 1
7 -> 1
8 -> 2
9 -> 2
...
288 -> 288
289 -> 290
290 -> 292
...

The linear remapping for the vertical pixel positions should be:

0 -> 0
1 -> 2
2 -> 4
...

The whole remapping log for this example can be found here.