Semantic Segmentation is a type of image segmentation which classifies every pixel as belonging to a particular class of object, for example this demo classifies foreground pixels of cats/kittens and background. It is one of the most difficult computer vision tasks to perform as the granuality of classification is the smallest (pixel) compared to an image (image classification), bounding box (object detection) which require fewer outputs. For example, a 100px by 100px image would require 100,000 different decisions about each class for segmentation. The model trained here is based on the deep learning UNET architecture and is trained using the Oxford Pet dataset. Training took around 24 hours and was ran on a GPU on Azure Machine Learning Studio. Read more about Azure Machine Learning Studio here.
The demo below segments images of cats into foreground and background. The majority of the training data used was front facing images of cats; therefore, similar images will perform better on this model. The model is not perfect and there is room for improvement. If a non-cat image is uploaded, the model will perform poorly. Models such as this could be combined into a pipeline where object detection identifies a bounding box where a cat is, then a segmentation model such as this could classify each pixel.
Please upload an image* with text, in PNG, JPEG, or JPG format.
Example: User Supplied Image
Example: Segmentated Returned Image
*All images are stored for 5 minutes and then deleted from the server.
Author: Henry Taylor | Email: henrytaylor@microsoft.com
Copyright: Microsoft.