Boost Object Detection with Data Augmentation: Master Rotation & Shearing

Introduction

To improve object detection accuracy, data augmentation techniques like rotation and shearing play a key role. These transformations help models recognize objects from multiple angles and perspectives, making them more robust in real-world scenarios. Rotation prevents overfitting by allowing the model to handle varying object orientations, while shearing simulates perspective distortions that are commonly seen in images. In this article, we’ll explore how to effectively implement these techniques and adjust bounding boxes to ensure accuracy, ultimately enhancing the performance and reliability of object detection models.

What is Rotation and Shearing Techniques?

Rotation and shearing are image transformation techniques used in data augmentation to improve object detection models. These techniques help the model recognize objects from different angles and perspectives, making it more adaptable and accurate in real-world situations. Rotation ensures the model can detect objects in various orientations, while shearing simulates perspective distortions. Both techniques expand the dataset artificially, reducing overfitting and increasing the model’s robustness without requiring additional labeled data.

Prerequisites

Before you jump into the exciting world of bounding box augmentation using rotation and shearing, there are a few concepts and tools you’ll want to get familiar with. Think of them as the secret ingredients that will bring your object detection magic to life. Let’s break it down, step by step.

Basic Understanding of Image Augmentation

Imagine you have a big pile of images you want to use to train your object detection model. But there’s a catch—you don’t have enough images to cover every possible situation. That’s where image augmentation steps in. It’s like taking what you’ve got and making it better. By applying transformations like rotation, flipping, and scaling, you can expand your dataset, making your model smarter and more adaptable.

But here’s the thing—you need to understand how each transformation changes your images. For example, when you rotate an image or flip it upside down, how does that affect the objects in the picture? You need to know this to make sure your transformations work smoothly.

Bounding Boxes

Now, let’s talk about the star of the show in object detection: bounding boxes. These are the labels that tell the model what objects are in the image. A bounding box is defined by its top-left and bottom-right corners, usually represented as coordinates (x_min, y_min, x_max, y_max). You’ll spot them as rectangular outlines drawn around objects in the images.

When you apply transformations like rotation or shearing, these boxes change shape and position. So, knowing how to adjust them is super important. If you don’t adjust the boxes correctly, your model might get confused, and that’s definitely something we want to avoid.

Coordinate Geometry

When you start twisting and turning images with rotation or skewing them with shearing, you’ll quickly notice that everything in the image shifts around. So how do you figure out where everything ends up after you spin or stretch the picture? That’s where a bit of coordinate geometry comes in.

It’s about understanding how points (like the corners of your bounding boxes) are positioned in 2D space. When you rotate or shear an image, you’ll need to know how to calculate the new positions of these points so that the bounding boxes line up correctly with the objects. Think of it like a treasure map: knowing where the “X” is helps you find the hidden treasure—and in this case, that “X” is the correctly aligned bounding box.

Python and NumPy

Finally, we come to the dynamic duo of programming for this task: Python and NumPy. These two are your best friends when handling image data. Python is a super versatile language, perfect for writing scripts to apply transformations to your images and update the bounding box coordinates. NumPy goes a step further by making numerical operations like matrix multiplications and array manipulations a piece of cake.

Whether you’re rotating an image or adjusting bounding box dimensions, NumPy makes the math behind it all smooth and efficient. Getting comfortable with these tools will help you apply the augmentations described here without breaking a sweat.

With these building blocks in place, you’re all set to start applying rotation and shearing augmentations to make your object detection models even stronger. Ready to jump in? Learning OpenCV 3 (2018)

GitHub Repo

Imagine you’re deep into your machine learning project, working on making your object detection models even better. You’re testing out data augmentation techniques like rotation and shearing, and you’re eager to see the results. Well, here’s where the magic happens: the GitHub repository.

Inside this treasure trove, you’ll find everything you need. The repository is filled with the full data augmentation library, including all the rotation and shearing techniques discussed earlier in the article. But that’s not all. It also has other handy functions to help with object detection, making sure your bounding boxes stay perfectly aligned with your transformed images. So, if you’ve been wondering how to keep everything in check as your images twist and turn, this is the place to go.

You know the deal—image transformations can get a bit messy if you’re not careful. But here’s the good news: the code in this repository has got you covered. It’s all set up and ready to go, and you can use it as-is or adjust it to suit your specific needs. Want to give your rotation technique a little extra twist or experiment with the shearing factor? Go ahead, the code is your playground.

But wait, that’s not even the best part. The real magic of this repository is how it can help improve your object detection models. Whether you’re training on new datasets or fine-tuning performance, the tools here give you the flexibility to increase the accuracy and robustness of your models.

And if you’re not sure how to get started, don’t worry. The repository comes with clear, easy-to-follow documentation that shows you exactly what to do. You’ll find step-by-step instructions on how to apply the transformations, and before you know it, you’ll be adding rotation and shearing to your image augmentation toolkit like a pro.

So, whether you’re looking to experiment or just streamline your workflow, the repository is the perfect place to start. Go ahead, give it a try, and watch how these techniques transform your object detection models.

Scikit-image GitHub Repository

Rotation

Let’s set the scene: You’re working on an object detection project, and now, you need to make your model smart enough to recognize objects from various angles. But how do you teach it to look at objects in different orientations? Well, that’s where rotation comes into play, and trust me, it’s not as simple as just flipping a picture. Rotation in data augmentation is one of the trickier moves because when you twist an image, everything in the picture moves—objects, pixels, and even the bounding boxes that outline the objects. Suddenly, it’s not just about moving the objects, but also about keeping track of their boundaries, making sure everything lines up perfectly. And you’re about to see how it all works.

Now, picture this: You’ve got an image, and you want to rotate it, say by an angle θ. The trick to rotating any image is something called an affine transformation. Sounds fancy, right? But at its core, it’s just a mathematical process that lets us stretch, scale, and, you guessed it, rotate images while keeping lines parallel. It’s like the image gets a little stretch and twist, but nothing is skewed or distorted—just neatly turned. We do this using something called a transformation matrix, which sounds like a high-tech device from a spy movie, but in reality, it’s just a 2×3 matrix that helps us figure out where to move each point in the image. You take the original coordinates of a point, mix them with this matrix, and out pops a new position for that point after rotation.

So, imagine you’re rotating an image with OpenCV. The magic happens when you use the cv2.getRotationMatrix2D function. Here’s how you break it down in code:


def __init__(self, angle = 10):
    self.angle = angle
    if type(self.angle) == tuple:
        assert len(self.angle) == 2, “Invalid range”
    else:
        self.angle = (-self.angle, self.angle)

In this snippet, you define the rotation angle. The cool part is you can set a single value or even a range, so the rotation can vary each time. It’s like giving your model a bit of unpredictability, which is exactly what you want for robust object detection.

After that, we use OpenCV to get the rotation matrix. The matrix knows how to rotate the image around its center. It’s like you’re spinning the image on a spinning wheel right in the center of the picture.


(h, w) = image.shape[:2]
(cX, cY) = (w // 2, h // 2)
M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)

Once the matrix is ready, we apply the transformation with another OpenCV function, cv2.warpAffine , which takes the image and the matrix and applies the transformation.


image = cv2.warpAffine(image, M, (w, h))

Now here’s the catch—when you rotate an image, it doesn’t just stay neatly in its original box. The corners can stretch out, and OpenCV will cut off the parts that go outside the original bounds. So, your nice, clean image might end up looking like it lost a chunk of its edges. That’s a problem, and we need to fix it.

We want to make sure the whole rotated image fits within its new box. To do this, we calculate the new width and height after rotation, making sure the entire image stays intact. Here’s where trigonometry comes in handy. We use the cosine and sine of the angle to figure out how big the new box should be:


cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))

So now, we’ve figured out the new size of the image, but what about the position? We can’t let our image just float around in the new box. We need it to stay centered, which means we have to adjust the image’s position within the new bounds. We shift it back to the center by adjusting the transformation matrix:


M[0, 2] += (nW / 2) – cX
M[1, 2] += (nH / 2) – cY

Finally, we wrap this up into a neat function called rotate_im . This function takes the image and the rotation angle, applies the rotation, and returns the rotated image without any unwanted cropping or misalignment.


def rotate_im(image, angle):
    “””Rotate the image.
    Rotate the image such that the rotated image is enclosed inside the tightest rectangle.
    The area not occupied by the pixels of the original image is colored black.</p>
<p>    Parameters
    ———-
    image : numpy.ndarray
        numpy image
    angle : float
        angle by which the image is to be rotated</p>
<p>    Returns
    ——-
    numpy.ndarray
        Rotated Image
    “””
    (h, w) = image.shape[:2]
    (cX, cY) = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])
    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))
    M[0, 2] += (nW / 2) – cX
    M[1, 2] += (nH / 2) – cY
    image = cv2.warpAffine(image, M, (nW, nH))
    return image

And there you go! With this function, you can rotate any image without worrying about losing important parts or misaligning objects. Your object detection model will now be able to recognize objects in all sorts of orientations—thanks to rotation and a little bit of math!

Scikit-Image Documentation on Image Transformation

Rotating the Image

Imagine you’re working on an object detection project, and you want your model to be able to recognize an object no matter which angle it’s seen from. Sounds simple enough, right? But here’s the thing: rotating an image isn’t as easy as just spinning it around. It’s a whole dance between math, pixels, and bounding boxes. Let me walk you through the magic of rotation, one of the trickiest tricks in data augmentation.

The first thing you need to do is rotate your image by an angle θ, around its center. To make this happen, we use something called a transformation matrix. Now, don’t get too scared; a transformation matrix is just a fancy way of calculating how every point in your image moves after a rotation. It’s like having a set of instructions for how to shift each pixel when the image gets twisted.

We kick off the rotation process using OpenCV’s cv2.getRotationMatrix2D . Here’s how the math looks in code:


(h, w) = image.shape[:2]
(cX, cY) = (w // 2, h // 2)
M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)

In this snippet, h and w are the height and width of the image. (cX, cY) represents the center of the image, and the angle you want to rotate it by. This rotation matrix (M) is the secret sauce that helps us move every point in the image around its center.

Next up, we apply the rotation with the help of cv2.warpAffine , a function that applies the rotation matrix to the image. But here’s where it gets tricky. When you rotate an image, it’s not like just spinning a square on your desk; the edges of the image might end up hanging outside the original frame. We might end up cutting off parts of the image!


image = cv2.warpAffine(image, M, (w, h))

The original width and height (w, h) are the dimensions we want the rotated image to fit into, but if the image rotates too far, parts of it could be cut off. Now, we don’t want to lose any valuable data, right? That’s where the real challenge comes in: we need to figure out how to make the new image big enough to fit the rotated version without losing any pieces.

To handle this, we tweak the dimensions of the image. How do we do that? You guessed it, a little trigonometry! By using the cosine and sine of the angle we rotate by, we can calculate the new size of the image, ensuring that everything fits. Here’s how it looks in code:


cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))

In these lines, nW and nH represent the new width and height of the image. This is the magic formula that guarantees your rotated image will fit into its new frame without getting cut off. It’s like resizing your canvas to fit a rotated painting.

But even after we’ve figured out the new dimensions, we have one last detail to sort out: centering. If we simply rotate the image, the whole thing might shift off-center. But we want the image to stay perfectly aligned, right? So, we adjust the transformation matrix to make sure the center of the image stays in the exact same spot, no matter how much the dimensions change:


M[0, 2] += (nW / 2) – cX
M[1, 2] += (nH / 2) – cY

By doing this, we shift the image so its new center aligns with the original center. This small step makes sure everything stays in place after the rotation.

Finally, we wrap all of this up into one neat little function called rotate_im . This function takes the image and the angle as inputs, applies the rotation, and returns the perfectly rotated image:


def rotate_im(image, angle):
        “””Rotate the image.
        Rotate the image such that the rotated image is enclosed inside the tightest rectangle.
        The area not occupied by the pixels of the original image is colored black.
        Parameters    ———-
        image : numpy.ndarray numpy image
        angle : float angle by which the image is to be rotated
        Returns          ——-
        numpy.ndarray Rotated Image  “””
        (h, w) = image.shape[:2]
        (cX, cY) = (w // 2, h // 2)
        M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
        cos = np.abs(M[0, 0])
        sin = np.abs(M[0, 1])
        nW = int((h * sin) + (w * cos))
        nH = int((h * cos) + (w * sin))
        M[0, 2] += (nW / 2) – cX
        M[1, 2] += (nH / 2) – cY
        image = cv2.warpAffine(image, M, (nW, nH))
        return image

With this function, you can now rotate any image by the angle you choose, ensuring that the image stays centered and no data gets cropped out. Whether you’re working with object detection or just need to rotate images for some other task, this approach ensures that your images stay intact and perfectly aligned. Rotate away, and let your model recognize objects from all angles!

Online Learning for Image Recognition

OpenCV Rotation Side-Effect

You know, when you rotate an image, there’s one annoying issue that tends to pop up: parts of the image can get cropped. This happens because, as the image rotates, it might extend beyond its original boundaries. It’s like spinning a square on a table—parts of it will start hanging off the edges, right? So, how do we fix that? How can we make sure we don’t lose any part of the image while rotating it?

Well, here’s the good news: OpenCV has got our backs! It provides a simple solution with a built-in argument that lets us adjust the size of the final image. Instead of sticking to the original dimensions, we can resize the image to fit the entire rotated version, ensuring no precious pixels get cropped off.

This idea comes from Adrian Rosebrock’s work on PyImageSearch, where he goes into detail about the calculations needed for this. The main problem we’re dealing with here is figuring out the new image dimensions after rotation. And you guessed it—the answer lies in a bit of trigonometry. We can use the properties of rotation to calculate the new width and height that will fully fit the rotated image.

Let’s break it down, shall we? Picture this: you have a rectangle that you want to rotate by an angle θ. As the image turns, its original shape gets stretched. The challenge here is that the outermost corners of the rectangle will need more space than before. So, we need a bounding box (that outer white rectangle in the diagram) that’s bigger than the original image to completely enclose the rotated version.

Here’s how we calculate the new dimensions:

First, take the cosine and sine of the rotation angle. Then, use these trigonometric values to adjust both the width and height of the image.


cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])</p>
<p># Compute the new bounding dimensions of the image
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))

What’s happening here is that cos and sin represent the horizontal and vertical effects of the rotation on the original image. By multiplying these with the original dimensions (w and h for width and height), we calculate the new width (nW) and height (nH) that will fully contain the rotated image.

These new dimensions ensure that after rotation, the image will fit perfectly within its new frame, and nothing will be cropped. We’re basically expanding the image’s size just enough to make sure all the corners stay in place.

So, with this trigonometric magic, we make sure that the rotated object fits inside the image’s new boundary without losing any important details. This method is especially important in object detection tasks, where accuracy is key. When you’re working with bounding boxes that need to adjust to the rotation, you want to make sure that nothing gets missed. And this is the trick to making it work!

PyImageSearch OpenCV Image Rotation Guide

Rotating the Bounding Box

When it comes to image augmentation, rotating the bounding box is often one of the toughest challenges. Unlike simply rotating the whole image, where everything spins evenly, rotating a bounding box is more like trying to fit a tilted rectangle into a straight-edged box. This means not only do you have to rotate the box, but also adjust the position and shape of the object within the image. It’s like trying to fit a square peg into a round hole, but with a bit more math involved.

Let’s walk through how we can tackle this tricky task step by step.

Rotating the Bounding Box

To make it clearer, let’s break it down with a visual. Think about the first box, which is neatly aligned, and then you rotate it. The goal now is to capture all four corners of the rotated box to calculate its new dimensions. You could technically calculate the final bounding box using just two corners, but that involves some tricky trigonometric math. It’s much easier—and cleaner—to work with all four corners of the box. This gives us a much simpler way to compute the final size of the bounding box.

Getting the Four Corners of the Bounding Box

The next step is to grab the coordinates of all four corners of the bounding box. To do this, we write a handy function called get_corners . This function takes in the original bounding boxes and gives us the coordinates of all the corners. Here’s how it works:


def get_corners(bboxes):
    “””Get corners of bounding boxes.    Parameters
    ———-
    bboxes: numpy.ndarray
        Numpy array containing bounding boxes of shape `N X 4`, where N is the number of bounding boxes and the bounding boxes are represented by the format `x1 y1 x2 y2`.    Returns
    ——-
    numpy.ndarray
        Numpy array of shape `N x 8` containing N bounding boxes, each described by their corner coordinates `x1 y1 x2 y2 x3 y3 x4 y4`.
    “””
    width = (bboxes[:, 2] – bboxes[:, 0]).reshape(-1, 1)
    height = (bboxes[:, 3] – bboxes[:, 1]).reshape(-1, 1)
    x1 = bboxes[:, 0].reshape(-1, 1)
    y1 = bboxes[:, 1].reshape(-1, 1)
    x2 = x1 + width
    y2 = y1
    x3 = x1
    y3 = y1 + height
    x4 = bboxes[:, 2].reshape(-1, 1)
    y4 = bboxes[:, 3].reshape(-1, 1)
    corners = np.hstack((x1, y1, x2, y2, x3, y3, x4, y4))
    return corners

After running this function, each bounding box will be described by eight coordinates: x1, y1, x2, y2, x3, y3, x4, y4. These are the four corners of the bounding box before rotation, and they’ll be super useful when we start rotating them.

Rotating the Bounding Box Using the Transformation Matrix

Now, it’s time to rotate the bounding box. To do this, we’ll use a transformation matrix—a key tool in geometry and image manipulation. We define another function, rotate_box , which rotates the corners of the bounding box by a specified angle. Here’s how it looks in action:


def rotate_box(corners, angle, cx, cy, h, w):
    “””Rotate the bounding box.    Parameters
    ———-
    corners: numpy.ndarray
        Numpy array of shape `N x 8` containing N bounding boxes, each described by their corner coordinates `x1 y1 x2 y2 x3 y3 x4 y4`.
    angle: float
        Angle by which the image is to be rotated.
    cx: int
        X-coordinate of the center of the image (about which the box will be rotated).
    cy: int
        Y-coordinate of the center of the image (about which the box will be rotated).
    h: int
        Height of the image.
    w: int
        Width of the image.    Returns
    ——-
    numpy.ndarray
        Numpy array of shape `N x 8` containing N rotated bounding boxes each described by their corner coordinates `x1 y1 x2 y2 x3 y3 x4 y4`.
    “””
    corners = corners.reshape(-1, 2)
    corners = np.hstack((corners, np.ones((corners.shape[0], 1), dtype=type(corners[0][0]))))
    M = cv2.getRotationMatrix2D((cx, cy), angle, 1.0)
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])
    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))
    M[0, 2] += (nW / 2) – cx
    M[1, 2] += (nH / 2) – cy
    calculated = np.dot(M, corners.T).T
    calculated = calculated.reshape(-1, 8)
    return calculated

Here, the rotate_box function does the heavy lifting. It applies the rotation matrix to the corners of the bounding box and ensures that the rotated box stays centered by adjusting the translation accordingly.

Calculating the Tightest Bounding Box

Now, let’s talk about the final step—calculating the tightest bounding box that can fully enclose the rotated bounding box. We use a function called get_enclosing_box for this task. This function looks at the rotated corner coordinates and computes the smallest rectangle that can fully contain them.

Here’s the code for that:


def get_enclosing_box(corners):
    “””Get an enclosing box for rotated corners of a bounding box.    Parameters
    ———-
    corners: numpy.ndarray
        Numpy array of shape `N x 8` containing N bounding boxes, each described by their corner coordinates `x1 y1 x2 y2 x3 y3 x4 y4`.    Returns
    ——-
    numpy.ndarray
        Numpy array containing enclosing bounding boxes of shape `N x 4`, where N is the number of bounding boxes and the bounding boxes are represented in the format `x1 y1 x2 y2`.
    “””
    x_ = corners[:, [0, 2, 4, 6]]
    y_ = corners[:, [1, 3, 5, 7]]
    xmin = np.min(x_, 1).reshape(-1, 1)
    ymin = np.min(y_, 1).reshape(-1, 1)
    xmax = np.max(x_, 1).reshape(-1, 1)
    ymax = np.max(y_, 1).reshape(-1, 1)
    final = np.hstack((xmin, ymin, xmax, ymax, corners[:, 8:]))
    return final

This function is crucial because it helps us determine the minimum and maximum values for the x and y coordinates of the rotated bounding box, which we then use to define the smallest enclosing rectangle.

Putting It All Together

Finally, we need a function that brings everything together—the __call__ function. This function takes the image and bounding boxes, applies the rotation, and returns the transformed image and bounding boxes.


def __call__(self, img, bboxes):
    angle = random.uniform(*self.angle)
    w, h = img.shape[1], img.shape[0]
    cx, cy = w // 2, h // 2
    img = rotate_im(img, angle)
    corners = get_corners(bboxes)
    corners = np.hstack((corners, bboxes[:, 4:]))
    corners[:, :8] = rotate_box(corners[:, :8], angle, cx, cy, h, w)
    new_bbox = get_enclosing_box(corners)
    scale_factor_x = img.shape[1] / w
    scale_factor_y = img.shape[0] / h
    img = cv2.resize(img, (w, h))
    new_bbox[:, :4] /= [scale_factor_x, scale_factor_y, scale_factor_x, scale_factor_y]
    bboxes = new_bbox
    bboxes = clip_box(bboxes, [0, 0, w, h], 0.25)
    return img, bboxes

Rotation-based Image Augmentation Techniques

Rotating the Bounding Box

Imagine this: you’re rotating an image. When you rotate the image, the bounding box around an object, which is normally defined by its top-left and bottom-right corners, rotates too. This results in a tilted bounding box. But we need more than just the rotated box. We need to find the smallest rectangle that can fully enclose this tilted box while still staying aligned with the sides of the original image. To make it clearer, let’s break it down with a visual. Think about the first box, which is neatly aligned, and then you rotate it. The goal now is to capture all four corners of the rotated box to calculate its new dimensions. You could technically calculate the final bounding box using just two corners, but that involves some tricky trigonometric math. It’s much easier—and cleaner—to work with all four corners of the box. This gives us a much simpler way to compute the final size of the bounding box.

Getting the Four Corners of the Bounding Box


def get_corners(bboxes):
    “””Get corners of bounding boxes.
    Parameters
    ———-
    bboxes: numpy.ndarray
        Numpy array containing bounding boxes of shape `N X 4`, where N is the
        number of bounding boxes and the bounding boxes are represented by the
        format `x1 y1 x2 y2`.
    Returns
    ——-
    numpy.ndarray
        Numpy array of shape `N x 8` containing N bounding boxes, each described by
        their corner coordinates `x1 y1 x2 y2 x3 y3 x4 y4`.
    “””
    width = (bboxes[:, 2] – bboxes[:, 0]).reshape(-1, 1)
    height = (bboxes[:, 3] – bboxes[:, 1]).reshape(-1, 1)
    x1 = bboxes[:, 0].reshape(-1, 1)
    y1 = bboxes[:, 1].reshape(-1, 1)
    x2 = x1 + width
    y2 = y1
    x3 = x1
    y3 = y1 + height
    x4 = bboxes[:, 2].reshape(-1, 1)
    y4 = bboxes[:, 3].reshape(-1, 1)  </p>
<p>    corners = np.hstack((x1, y1, x2, y2, x3, y3, x4, y4))
    return corners

After running this function, each bounding box will be described by eight coordinates: x1, y1, x2, y2, x3, y3, x4, y4 . These are the four corners of the bounding box before rotation, and they’ll be super useful when we start rotating them.

Rotating the Bounding Box Using the Transformation Matrix


def rotate_box(corners, angle, cx, cy, h, w):
    “””Rotate the bounding box.
    Parameters
    ———-
    corners: numpy.ndarray
        Numpy array of shape `N x 8` containing N bounding boxes, each described by
        their corner coordinates `x1 y1 x2 y2 x3 y3 x4 y4`.
    angle: float
        Angle by which the image is to be rotated.
    cx: int
        X-coordinate of the center of the image (about which the box will be rotated).
    cy: int
        Y-coordinate of the center of the image (about which the box will be rotated).
    h: int
        Height of the image.
    w: int
        Width of the image.
    Returns
    ——-
    numpy.ndarray
        Numpy array of shape `N x 8` containing N rotated bounding boxes each described by
        their corner coordinates `x1 y1 x2 y2 x3 y3 x4 y4`.
    “””
    corners = corners.reshape(-1, 2)
    corners = np.hstack((corners, np.ones((corners.shape[0], 1), dtype=type(corners[0][0]))))  </p>
<p>    M = cv2.getRotationMatrix2D((cx, cy), angle, 1.0)
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])  </p>
<p>    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))  </p>
<p>    M[0, 2] += (nW / 2) – cx
    M[1, 2] += (nH / 2) – cy  </p>
<p>    calculated = np.dot(M, corners.T).T
    calculated = calculated.reshape(-1, 8)
    return calculated

Calculating the Tightest Bounding Box


def get_enclosing_box(corners):
    “””Get an enclosing box for rotated corners of a bounding box.
    Parameters
    ———-
    corners: numpy.ndarray
        Numpy array of shape `N x 8` containing N bounding boxes, each described by
        their corner coordinates `x1 y1 x2 y2 x3 y3 x4 y4`.
    Returns
    ——-
    numpy.ndarray
        Numpy array containing enclosing bounding boxes of shape `N x 4`, where N is the
        number of bounding boxes and the bounding boxes are represented in the format `x1 y1 x2 y2`.
    “””
    x_ = corners[:, [0, 2, 4, 6]]
    y_ = corners[:, [1, 3, 5, 7]]  </p>
<p>    xmin = np.min(x_, 1).reshape(-1, 1)
    ymin = np.min(y_, 1).reshape(-1, 1)
    xmax = np.max(x_, 1).reshape(-1, 1)
    ymax = np.max(y_, 1).reshape(-1, 1)  </p>
<p>    final = np.hstack((xmin, ymin, xmax, ymax, corners[:, 8:]))
    return final

Putting It All Together

Finally, we need a function that brings everything together—the call function. This function takes the image and bounding boxes, applies the rotation, and returns the transformed image and bounding boxes.


def __call__(self, img, bboxes):
    angle = random.uniform(*self.angle)
    w, h = img.shape[1], img.shape[0]
    cx, cy = w // 2, h // 2
    img = rotate_im(img, angle)  </p>
<p>    corners = get_corners(bboxes)
    corners = np.hstack((corners, bboxes[:, 4:]))  </p>
<p>    corners[:, :8] = rotate_box(corners[:, :8], angle, cx, cy, h, w)
    new_bbox = get_enclosing_box(corners)  </p>
<p>    scale_factor_x = img.shape[1] / w
    scale_factor_y = img.shape[0] / h
    img = cv2.resize(img, (w, h))
    new_bbox[:, :4] /= [scale_factor_x, scale_factor_y, scale_factor_x, scale_factor_y]
    bboxes = new_bbox
    bboxes = clip_box(bboxes, [0, 0, w, h], 0.25)
    return img, bboxes

This function performs the entire rotation process: rotating the image, adjusting the bounding boxes, recalculating their positions, and ensuring that the image is scaled properly. It even makes sure the bounding boxes stay in the correct position after the rotation, maintaining their alignment with the objects in the image. And there you have it—rotation of bounding boxes handled seamlessly!

Bounding Box Rotation for Image Augmentation (2022)

Shearing

Imagine you’re looking at an image, and you want to skew it, but not in the usual way. Instead of just stretching or squeezing it, you decide to make it look like it’s leaning to one side. That’s shearing. It’s one of those cool tricks in image augmentation that makes your object detection models much stronger. The goal? To help your model recognize objects, even when they’re viewed from weird angles, which happens all the time in the real world.

The Shear Transformation

Let’s break it down: when you shear an image, you’re shifting its pixels horizontally. It’s like taking a picture and pushing it sideways. For every pixel at a point (x, y), we change its x-coordinate by adding some value of alpha * y. That “alpha” is our shearing factor. The bigger the value of alpha, the more the image will tilt. So, imagine if alpha were 0.1 – the image would barely lean. But if alpha were 1.5, it would look like the whole scene is tipping over.

This horizontal transformation doesn’t mess with the vertical direction. The height of the image stays the same, but the shape of the objects inside it gets skewed. And this kind of change? It’s a game-changer for object detection models. It teaches the model to handle and recognize objects, even when they appear tilted or slanted.

Now, let’s see how we can perform this transformation using a class called RandomShear :


class RandomShear(object):
    “””Randomly shears an image in the horizontal direction.
    Bounding boxes with an area of less than 25% in the transformed image are dropped.
    The resolution of the image is maintained, and any remaining empty areas are filled
    with black color.
    Parameters
    ———-
    shear_factor: float or tuple(float)
    If a float, the image is sheared horizontally by a factor drawn randomly from a
    range (-`shear_factor`, `shear_factor`). If a tuple, the `shear_factor` is drawn
    randomly from values specified in the tuple.
    Returns
    ——-
    numpy.ndarray
    Sheared image in the numpy format of shape `HxWxC`.
    numpy.ndarray
    Transformed bounding box coordinates in the format `n x 4`, where `n` is the
    number of bounding boxes, and 4 represents `x1, y1, x2, y2` of the bounding box.
    “””
    def __init__(self, shear_factor=0.2):
        self.shear_factor = shear_factor
        if type(self.shear_factor) == tuple:
            assert len(self.shear_factor) == 2, “Invalid range for shear factor”
        else:
            self.shear_factor = (-self.shear_factor, self.shear_factor)            shear_factor = random.uniform(*self.shear_factor)

In this class, we set the shear_factor—that’s the value that controls how much the image skews. It can either be a fixed value, or we can let it pick a random number within a range. This randomness helps make the model even stronger because it learns to handle different distortions.

Augmentation Logic

So how exactly do we apply the shear to the image? Well, the logic is pretty straightforward. We tweak the x-coordinates of the bounding box corners, using that formula: x = x + alpha * y . The image itself will lean, and the bounding boxes need to keep up, adjusting their x-coordinates to stay aligned with the new positions of the objects.

Here’s how the magic happens:


def __call__(self, img, bboxes):
    shear_factor = random.uniform(*self.shear_factor)
    w, h = img.shape[1], img.shape[0]
    if shear_factor < 0:
        img, bboxes = HorizontalFlip()(img, bboxes)
    M = np.array([[1, abs(shear_factor), 0], [0, 1, 0]])
    nW = img.shape[1] + abs(shear_factor * img.shape[0])
    bboxes[:, [0, 2]] += ((bboxes[:, [1, 3]]) * abs(shear_factor)).astype(int)
    img = cv2.warpAffine(img, M, (int(nW), img.shape[0]))
    if shear_factor < 0:
        img, bboxes = HorizontalFlip()(img, bboxes)
    img = cv2.resize(img, (w, h))
    scale_factor_x = nW / w
    bboxes[:, :4] /= [scale_factor_x, 1, scale_factor_x, 1]
    return img, bboxes

Here’s what’s happening:

Random Shear Factor: The shear_factor is randomly chosen within a range (we can control that range).
Horizontal Flip (if needed): If the shear factor is negative, we flip the image horizontally before shearing. This ensures the bounding boxes don’t shrink incorrectly.
Transformation Matrix (M): This matrix applies the horizontal shear. It changes the x-coordinates based on the y-values, and we use OpenCV’s cv2.warpAffine function to apply this to the image.
Bounding Box Adjustment: The bounding box coordinates are modified based on the shear factor, so the bounding boxes stay aligned with their objects.
Rescaling: After shearing the image, we resize it back to its original dimensions. This keeps the image’s resolution intact.

Handling Negative Shear

Now, here’s a little twist—negative shearing. When we shear negatively, the bottom-right corner of the bounding box (usually referred to as x2) might move in the opposite direction. This can mess with the bounding box, causing it to shrink or become misaligned.

So, what do we do about it? We use a neat trick: flip the image and bounding boxes first, apply the positive shear, and then flip them back. This lets us apply the shear as if it were positive and keeps the bounding boxes in check.

Here’s the fix:


if shear_factor < 0:
    img, bboxes = HorizontalFlip()(img, bboxes)
# Apply the positive shear transformation
M = np.array([[1, abs(shear_factor), 0], [0, 1, 0]])
nW = img.shape[1] + abs(shear_factor * img.shape[0])
bboxes[:, [0, 2]] += ((bboxes[:, [1, 3]]) * abs(shear_factor)).astype(int)
img = cv2.warpAffine(img, M, (int(nW), img.shape[0]))
# Flip back the image and bounding boxes after the shear
if shear_factor < 0:
    img, bboxes = HorizontalFlip()(img, bboxes)
# Resize image to original dimensions
img = cv2.resize(img, (w, h))

Testing the Shear Augmentation

Now that everything’s set up, it’s time to test it. You can combine the shear with rotation, which is a great way to push your object detection models to handle more variations. Here’s how you test it:


from data_aug.bbox_utils import *
import matplotlib.pyplot as plt
rotate = RandomRotate(20)
shear = RandomShear(0.7)
img, bboxes = rotate(img, bboxes)
img, bboxes = shear(img, bboxes)
plt.imshow(draw_rect(img, bboxes))

In this test:

The RandomRotate class rotates the image and bounding boxes.
The RandomShear class applies the horizontal shear to the rotated image and bounding boxes.
Finally, draw_rect shows the transformed image with bounding boxes to confirm everything’s still aligned.

Wrapping It Up

So, that’s shearing in a nutshell! By applying both positive and negative shearing transformations, you’re teaching your object detection models to handle various perspectives and distortions. And by adjusting the bounding boxes to fit the new shape, you ensure that everything stays in sync. With this powerful augmentation technique, your models will be better equipped to recognize objects no matter how they’re tilted or skewed! ImageNet Large-Scale Visual Recognition Challenge (2017)

Testing it out

Alright, we’ve just finished implementing both the rotation and shearing augmentations. Now comes the fun part—testing them out! This is where we get to see the magic happen and make sure everything works just like we want. We need to check if the transformations are applied properly to both the image and its bounding boxes, keeping everything lined up and intact. Let’s dive into how we do that.

Picture this: we’ve got an image, and on that image, there are bounding boxes outlining different objects. Now, we want to see how these bounding boxes react when we rotate the image or shear it. You can think of it like taking a photo, rotating it at a random angle, and then giving it a little tilt to one side. But the real challenge is making sure the bounding boxes still match up with the objects, even after all that twisting and turning.

Here’s a simple setup to test it:


from data_aug.bbox_utils import *
import matplotlib.pyplot as plt
rotate = RandomRotate(20)   # Initialize the rotation augmentation with a 20-degree range.
shear = RandomShear(0.7)   # Initialize the shearing augmentation with a shear factor of 0.7.</p>
<p># Apply rotation and shear augmentations to the image and bounding boxes.
img, bboxes = rotate(img, bboxes)
img, bboxes = shear(img, bboxes)</p>
<p># Visualize the result by drawing the bounding boxes on the image.
plt.imshow(draw_rect(img, bboxes))

Let’s break it down:

Rotation Augmentation: The RandomRotate class is set up with a 20-degree range. This means the image—and everything in it—gets rotated by a random angle between -20 and +20 degrees. It’s like taking that photo, spinning it a bit, and checking how the objects still fit within their boxes.

Shearing Augmentation: Next, we’ve got the RandomShear class with a shear factor of 0.7. This factor controls how much the image gets skewed horizontally. The larger the factor, the more dramatic the tilt! Shearing changes the image’s shape but keeps its size intact. It’s like pulling one side of the photo to the left and watching everything stretch.

Bounding Box Adjustment: Now, the real magic happens. After we apply the transformations, we use the draw_rect function to visualize the bounding boxes on top of the transformed image. This makes sure that, even after rotating and shearing, the bounding boxes still fit snugly around the objects. It’s like making sure the frame around your picture doesn’t get distorted when you rotate or stretch it.

Visualization: Finally, the command

plt.imshow(draw_rect(img, bboxes))

takes care of displaying the final image with the bounding boxes. It’s like pulling up the curtain to reveal your masterpiece. This lets us see if everything’s still aligned and properly adjusted.

And here comes the twist—Resizing:

Once the rotation and shearing are done, there’s one more step to handle: resizing. While rotation and shearing are more about transforming the image, resizing is a little different—it’s like prepping the image to fit the model’s requirements. It adjusts the image’s dimensions to make sure it matches what the model expects for training or inference.

Even though resizing isn’t strictly a data augmentation technique, it’s crucial for standardizing the size of the images before they’re passed to the model. Think of it like making sure all the puzzle pieces are the same size before you try to fit them together.

With rotation, shearing, and resizing, we now have a solid set of image augmentations. These transformations don’t just tweak the images, making them more varied, but also help your object detection models become more robust. By introducing these different distortions, your model learns to recognize objects from various angles, distortions, and scales—just like how you might encounter them in the real world.

Data Augmentation for Object Detection

Conclusion

In conclusion, data augmentation techniques like rotation and shearing are essential tools for improving object detection models. By introducing variations in object orientation and perspective, these transformations help models become more robust and reliable in real-world scenarios. Rotation prevents overfitting by ensuring the model can recognize objects from different angles, while shearing simulates perspective changes commonly found in images. When applied correctly, these augmentations significantly enhance model performance and accuracy. Looking ahead, as the demand for more adaptable and accurate object detection models grows, incorporating these techniques will continue to be a crucial part of optimizing machine learning workflows.With rotation and shearing in your data augmentation toolkit, your models will be better equipped to handle a wide range of challenges and provide more reliable predictions.

Boost Object Detection with Data Augmentation: Rotation & Shearing Techniques

Alireza Pourmahdavi

I’m Alireza Pourmahdavi, a founder, CEO, and builder with a background that combines deep technical expertise with practical business leadership. I’ve launched and scaled companies like Caasify and AutoVM, focusing on cloud services, automation, and hosting infrastructure. I hold VMware certifications, including VCAP-DCV and VMware NSX. My work involves constructing multi-tenant cloud platforms on VMware, optimizing network virtualization through NSX, and integrating these systems into platforms using custom APIs and automation tools. I’m also skilled in Linux system administration, infrastructure security, and performance tuning. On the business side, I lead financial planning, strategy, budgeting, and team leadership while also driving marketing efforts, from positioning and go-to-market planning to customer acquisition and B2B growth.

Boost Object Detection with Data Augmentation: Master Rotation & Shearing

Table of Contents

Introduction

What is Rotation and Shearing Techniques?

Prerequisites

Basic Understanding of Image Augmentation

Bounding Boxes

Coordinate Geometry

Python and NumPy

GitHub Repo

Rotation

Rotating the Image

OpenCV Rotation Side-Effect

Rotating the Bounding Box

Rotating the Bounding Box

Getting the Four Corners of the Bounding Box

Rotating the Bounding Box Using the Transformation Matrix

Calculating the Tightest Bounding Box

Putting It All Together

Rotating the Bounding Box

Rotating the Bounding Box

Getting the Four Corners of the Bounding Box

Rotating the Bounding Box Using the Transformation Matrix

Calculating the Tightest Bounding Box

Putting It All Together

Shearing

Testing it out

Conclusion

Alireza Pourmahdavi