Image Content Classification using Google Cloud

In this blog, we are going to cover image classification using Google Cloud tools. Google Cloud offers comprehensive toolset to cover a wide set of usecases. We recently implemented a project for one of our clients and we’ll detail the best practices and lesson learnt from the implementation

We used the following tools to implement image classification.

Vision API

Vision API is one of the easy to use APIs that covers an exhaustive list of attributes for the given image. One of the attributes that we found helpful was to identify colors. The API was able to return to us the most dominant colors in a photo. But, the values returned are in hex code, which is not meaningful to humans, unless you are a web developer. For example, blue is (0,0,255). By using a python package called Webcolors, we were able to translate the hex values to something meaningful to humans such as green or violet.


AutoML was very good for image classification without having to tune the model or write base level Tensorflow code. It also did train,test, and verification splits for us. Scenery, aerial photos, and indoor/outdoor pictures did really well because it classified the image as a whole to identify the keyword. The keywords were also unique enough that it was easy to see a separation between each other. Although Vision API was also able to pick up on these items, it was not specific enough to identify the test.

It was also easy to evaluate performance and quickly use the model in a production setting after it was trained.

Pre-trained Object Detection

Pre-trained Object Detection from Tensorflow’s Model Zoo was used to detect instances within the image. The best use case for this was counting people. One of the requirements was to see if there was 1, 2, 3 people, a group, a crowd, or even nobody in a photo. Using a model trained on the COCO dataset, the object detection model was trained on detecting common objects with thousands of images of people and millions of iterations on a deep learning model, so it was best to leverage it than to train it again ourselves.

Also, in the Model Zoo, the Atomic Visual Action pre-trained object detection model had several actions that were already trained, and if they detected well, we turned to those before we considered to going custom.

Custom Object Detection

Custom Object detection was used last when all the other previous methods did not yield high accuracy. An example of some of the keywords was the sun and the moon. The Vision API did not pick up on that there was a sun or moon in the background unless it was the centerpiece of the photo, but object detection picked it up if even the sun was in the very back of the background since it was solely trained to identify that.

We also noticed that custom object detection was able to detect actions in a photo as well such as this person dribbling the ball on the court.

Image Classification Pipeline

After we used the best technique for each keyword, different models were generated to identify the keywords. The models were then integrated together so if an image was processed, many different labels would be detected catered to our client. For example, a picture can be labeled as aerial with 3 people standing outside in the daytime. Batch processing was also setup to detect all the images in a folder and output an excel spreadsheet with image name, label, and confidence score.


Feel free to reach out to us at if you have any questions or help with any similar usecases.