Enhancing a small dataset to improve a TFLite segmentation model

7 min readNov 21, 2022

This blogpost demonstrates the impact of Albumentations on a small dataset which is used to create a segmentation model that is deployed in a mobile phone as a .tflite file. By enhancing the dataset with some of the transforms, we manage to overcome different cases where the model created by the unenhanced dataset and is used inside the mobile phone fails. In the end we showcase as an experiment how a big dataset can be created from a single image and achieve high accuracy and mIoU after training!

The problem

A segmentation model was created to identify documents in the photos that have been taken by a mobile phone, having a dataset of only 400 images. This dataset has been provided by ARxVision a startup that has developed a wearable device that captures the world through audio and artificial intelligence to empower blind and low-vision individuals. You can see an example of a photo and its segmentation mask below:

After training a U-Net with a Resnet50 backbone the accuracy was pretty good. The model was converted to a .tflite file that is really handy to be used inside a mobile phone and we started the real world testing. At the beginning and having pictures that were similar to the dataset, the model was doing fine. Check below for an up-right image segmentation result:

But experimenting in different cases model started to fail.

Rotated target in the photos:

2. Cropped at the edge:

Distorted by mobile’s camera:

Distorted document with a mobile’s wide angle camera.

The above fail cases are common when a mobile phone is used where the user is free to capture an image in any angle and under any circumstances. So what is the solution if the dataset is relatively small and you are unable or you do not have much time to capture new images and create masks?

The solution

Albumentations comes as a really good solution to the above issues. This library is a computer vision tool that boosts the performance of deep convolutional neural networks by doing fast and flexible image augmentations. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance, and does so while providing a concise, yet powerful image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection.

Looking at the API reference there are numerous enhancements that manipulate the image with Blur, Crop, Dropout and Geometric transforms. Based on the issue you want to solve you can pick the transform you want and the library converts the image and its appropriate mask with a few lines of code:

import albumentations as A
import cv2

image = cv2.imread('image.jpg')
mask = cv2.imread('mask.png', cv2.IMREAD_GRAYSCALE)

# Do a horizontal flip on y axis
aug = A.HorizontalFlip(always_apply=True, p=1)

augmented = aug(image=image, mask=mask)

image_flipped = augmented['image']
mask_flipped = augmented['mask']

# Filename
filename_train = 'image_flipped.jpg'
filename_train_mask = 'mask_flipped.png'
  
# Using cv2.imwrite() method
# Saving the image
cv2.imwrite(filename_train, image_flipped)
cv2.imwrite(filename_train_mask, mask_flipped)

So let’s pick the appropriate transform to solve the above issues one by one.

Issue with rotated target inside the photo.

For this we are going to use the Rotate or better the ShiftScaleRotate choice from the library. This is as easy as:

    aug = A.ShiftScaleRotate(always_apply=False, p=0.5)

You can play with different values for the scale and rotate of the target or leave the default settings. Enhancing the dataset with images that were generated by this transform gives us a model that can handle rotation of different type of documents inside a photo taken by a mobile:

Rotated document captured by mobile phone.

2. Issue with cropped target at the edges.

The above ShiftScaleRotate choice can enhance the dataset also with images that are scaled and cropped at the edge. Take a look an example how the library is transforming the images:

Another example with ShiftScaleRotate transform.

Having images as the above inside the dataset helps the model inside the mobile phone to “find” cropped documents:

Mobile’s model recognizing cropped documents.

3. Issue with mobile’s camera distortion.

This type of problem is happening when you want to deploy the model inside a mobile phone which usually uses a wide angle camera. A wide-angle lens has a focal length of 35mm or shorter, which gives you a wide field of view. The wider your field of view, the more of the scene you’ll be able to see in the frame. But this comes with distortion, like Barrel distortion and if you do not have images in the dataset with this transform most probably the output will not be convenient. Since there are thousands of mobile phones out there, finding a method for each phone to correct the images as OpenCV offers is too difficult! The solution here is to include images inside the dataset that are distorted.

Check the PiecewiseAffine method:

And the ElasticTransform:

Enhancing the dataset with images like the above can help the model inside the mobile phone to handle the camera’s distortion:

At our example with 400 images using ShiftScaleRotate, PiesewiseAffine and ElasticTransform transformations resulted in 4.000 images with appropriate masks that used to train a TFLite model that was ready to handle different type of problems that occur when using it inside a mobile phone.

Technical data.

The simple model with the 400 images dataset after training had a validation accuracy of 79% and mIoU = 63,5%.

We created a dataset of 4000 images using the small dataset and only the ShiftScaleRotate option. The result was a trained model with validation accuracy of 96% and mIoU = 93%. This model will be used by ARxVision as an alternative for scanning documents inside images and help vision impaired individuals.

As an experiment we also created a dataset of 10.000 images starting only from one image, the first one that is showcased here. We used RGBShift to change the color of the images (70% of the dataset) and we created 10 images from 1. Later PiecewiseAffine was used and 100 images from the above 10 were created. To handle rotation, cropping and different sizes ShiftScaleRotate was used to create 1000 images from the 100. In the end we used ElasticTransform to handle the camera’s distortion and 10000 images from the last 1000 were created. After training the accuracy at the validation set was 80% and the mIoU = 60%, almost as the trained model with the 400 unenhanced images. (This model was used to showcase the 3 solutions above!)

Conclusion

When you have a relatively small dataset and you want to enhance it but you do not have the ability to add real world examples in it, then the Albumentations library is a handy alternative. It has numerous transformations that you can pick to overcome issues like cropped, rotated or distorted targets, cases that are not rare when the model is deployed in a mobile phone where the users are free to take a picture at their convenience.

Thanks to Sayak Paul for bringing the Albumentations library to my attention.
Special thanks to CEO of ARxVision Charles Leclercq for providing the dataset.

Enhancing a small dataset to improve a TFLite segmentation model

The problem

The solution

Written by George Soloupis

No responses yet