Implement LiteRT for a segmentation task utilizing the FastSAM model by Ultralytics.

George Soloupis
4 min readSep 19, 2024

--

Written by George Soloupis ML and Android GDE.

FastSAM, developed by Ultralytics, is a segmentation model built for rapid image segmentation, leveraging the Segment Anything Model (SAM) architecture. It simplifies the process of detecting and segmenting objects within an image, using various prompts such as bounding boxes, key points, and textual descriptions. The model is particularly optimized for single-class segmentation, making it efficient for tasks requiring fast and generalized object segmentation in images or videos.

Additionally, Google’s LiteRT (Lightweight Runtime) previously named TensorFlow Lite can further enhance its performance by enabling efficient inference on edge devices, making FastSAM applicable to real-time tasks with limited computational resources. LiteRT, which is designed for efficient deployment of machine learning models, could complement FastSAM by optimizing the runtime performance in resource-constrained environments, such as mobile or embedded systems. This synergy can make segmentation tasks faster and more accessible for edge-based AI applications, where speed and low latency are critical.

FastSAM achieves real-time segmentation by decoupling the segmentation task into all-instance segmentation with YOLOv8-seg and prompt-guided selection stages. By utilizing the computational efficiency of CNNs, FastSAM offers significant reductions in computational and resource demands while maintaining competitive performance. This dual-stage approach enables FastSAM to deliver fast and efficient segmentation suitable for applications requiring quick results.

Ultralytics simplifies the process of converting models into various formats, such as .tflite, which we used with LiteRT. This can be done with just a few simple commands:

!pip install ultralytics
from ultralytics import FastSAM

!wget https://github.com/ultralytics/assets/releases/download/v8.2.0/FastSAM-s.pt

# Use the model that has already been downloaded
model = FastSAM("FastSAM-s.pt")

# Use 'saved_model' parameter and find the final tflite files inside the generated folder.
# Options are ('torchscript', 'onnx', 'openvino', 'engine', 'coreml', 'saved_model', 'pb', 'tflite', 'edgetpu', 'tfjs', 'paddle', 'ncnn'
model.export(format="saved_model")

Before diving into mobile development, it’s always helpful to use the Python API to verify your results. This process involves three key steps: preprocessing, inference, and post-processing.

The preprocessing is straightforward: you simply divide each channel of every pixel in the image by 255, ensuring that the input values fall within the range of 0.0 to 1.0

def preprocess(img):
img = np.array(img, dtype=np.float32)
img /= 255 # 0 - 255 to 0.0 - 1.0
return img

The inference uses the LiteRT Interpreter class:

# We load the resized_image since it has the 640x640 dimensions the model requires.
image_input = Image.open("/content/resized_image.png")
#Below for preprocess
image_input = preprocess(np.array(image_input))
image_input = np.expand_dims(image_input, axis=0)
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="/content/FastSAM-s_float16_final.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
print(input_details)
output_details = interpreter.get_output_details()
print(output_details)

# Test the model on random input data.
#input_shape = input_details[0]['shape']
input_data = np.array(image_input, dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

output_data_0 = interpreter.get_tensor(output_details[0]['index'])
print(output_data_0.shape)
output_data_1 = interpreter.get_tensor(output_details[1]['index'])
print(output_data_1.shape)

The post-processing involves several steps to segment and color the objects detected within the bounding boxes recognized by the model. You can find the detailed implementation in the Colab notebook included within the Android project here.

Result from Python API inference.

The Android implementation must follow the same steps: preprocess the image, perform the inference, and then post-process the results to generate a visual representation of the mask.

We can visualize the structure of the .tflite model with Model Explorer:

Model Explorer graph.

LiteRT provides a support library that simplifies image preprocessing, allowing us to avoid manually manipulating and creating the ByteBuffer. This makes the process more efficient and streamlined.

// Inputs
val imageProcessor =
ImageProcessor.Builder()
.add(ResizeOp(Utils.MODEL_INPUTS_SIZE, Utils.MODEL_INPUTS_SIZE, ResizeOp.ResizeMethod.BILINEAR))
.add(NormalizeOp(0.0f, 255.0f))
.build()
var tensorImage = TensorImage(DataType.FLOAT32)
tensorImage.load(inputImage)
tensorImage = imageProcessor.process(tensorImage)
val inputTensorBuffer = tensorImage.buffer
val inputArray = arrayOf(inputTensorBuffer)

Inference is easily done by creating the ouputs than are going to encapsulate the results:

val probabilityBuffer1 = TensorBuffer.createFixedSize(
intArrayOf(1, 37, 8400),
DataType.FLOAT32
)
val probabilityBuffer2 = TensorBuffer.createFixedSize(
intArrayOf(1, 160, 160, 32),
DataType.FLOAT32
)
val outputMap = HashMap<Int, Any>()
outputMap[0] = probabilityBuffer1.buffer
outputMap[1] = probabilityBuffer2.buffer

// Run the inference
interpreterFastSam?.runForMultipleInputsOutputs(inputArray, outputMap)

Post- processing also involves a lot of steps to create the final array of masks that are going to be placed on top of the image that is been used. You can see the steps in this file of the project.

The result is amazing and it is provided really fast:

Result after the android inference.

You can find info and the complete project at this GitHub repository. Additionally if you want to see a blazing fast inference where the GPU is supported you can try this branch.

Conclusion
This guide explained how to implement LiteRT (formerly TensorFlow Lite) for real-time segmentation using the FastSAM model by Ultralytics. FastSAM enables efficient segmentation tasks, and when combined with LiteRT, it supports fast inference on edge devices like mobile phones. FastSAM model can be easily converted to formats like .tflite for mobile deployment. The Android implementation follows the same steps as the Python API: image preprocessing, inference, and post-processing to generate visual masks. LiteRT’s support library simplifies preprocessing, avoiding manual ByteBuffer manipulation, and allows for efficient execution in resource-constrained environments.

--

--

George Soloupis
George Soloupis

Written by George Soloupis

I am a pharmacist turned android developer and machine learning engineer. Right now I’m a senior android developer at Invisalign, a ML & Android GDE.

No responses yet