Objectives & Prerequisites:
By the end of the article you will learn how to:
- Apply OCR (Object Character Recognition) with Google’s Vision API.
- Apply the API with live streaming with video feed from your webcam.
Before beginning, you will need:
- Basic coding experience in Python.
- Some high-level understanding of Computer Vision techniques.
Difficulty: Beginner/Moderate
So what is Google’s Computer Vision API and why should I use it?
Google has released an API that can extract information from images very accurately. There are many features within this API, but the one I want to focus on today is text extraction from images. This product is so powerful that it can read image text in different fonts, languages, and even orientations (sideways, upside down). Because of this, it is better than any open source software that I have tried.
What are some of the applications?
This can be incredibly useful to:
- Pull out meaningful text from scanned documents instead of transcribing by hand.
- Extract information from a stack of business cards instead of manually inputting data into a database.
- Pull useful information from a billboard.
Basically, many tedious tasks can be automated with this API and it can be applied to a wide array of contexts. An added advantage is that the cost of using it relatively low. Behind this product is also years of research and testing from Google, so why reinvent the wheel?
You can try it out here: https://cloud.google.com/vision/
There are many ways to use this API, but what I am going to today is show you how to run both the Google Vision API and the live streaming capability with OpenCV, a fantastic Python package for image processing. What the code will do is access your webcam, allowing you to wave different objects with text such as a candy bar wrapper, a receipt, or even words on a t-shirt. In your command line terminal, it will show the text that appears in the images frame by frame.
Before We Get Going
- You will need to perform some pre-setup installations:
- “Pip install” the following packages:
- opencv-python – imported as cv2 in Python code
- google-cloud-vision
- Pillow – imported as PIL
- “Pip install” the following packages:
- You will also need to sign up for the Google Cloud Platform:
- You can sign up for a free trial here: https://console.cloud.google.com/freetrial
Steps for OCR Demo
1. Vision API Setup:
2. Vision API Code: This code is based on the source code from the Vision API guide, which I modified a bit.
# export GOOGLE_APPLICATION_CREDENTIALS=kyourcredentials.json import io import cv2 from PIL import Image # Imports the Google Cloud client library from google.cloud import vision from google.cloud.vision import types # Instantiates a client client = vision.ImageAnnotatorClient() def detect_text(path): """Detects text in the file.""" with io.open(path, 'rb') as image_file: content = image_file.read() image = types.Image(content=content) response = client.text_detection(image=image) texts = response.text_annotations string = '' for text in texts: string+=' ' + text.description return string
3. OpenCV Code: This code is based on the OpenCV Stream code.
cap = cv2.VideoCapture(0) while(True): # Capture frame-by-frame ret, frame = cap.read() file = 'live.png' cv2.imwrite( file,frame) # print OCR text print(detect_text(file)) # Display the resulting frame cv2.imshow('frame',frame) # When everything done, release the capture cap.release() cv2.destroyAllWindows()
- The important thing to note is that cv2.VideoCapture() function, there is a 0 input which activates the webcam. The input is typically a link to a static video or live video feed.
- Now you can save the entire portion of the code in a .py file and give it any name you want.
4. The Code in Action: Now open up in a terminal where you saved your code and API key and run the following code.
export GOOGLE_APPLICATION_CREDENTIALS=yourAPIkey.json first on command line
- Then run your Python (YOUROCRFILE.py) file on the command line immediately afterward.
Final Result
As you can see, the original detect_text() function earlier is embedded within the while loop. This enables the Vision API to check for text in the image frames while the live webcam is running. Below on the left is a screenshot of a backpack with our company logo on it. On the right side, is the output frame by frame. Words within the results are not consistent since I was waving the backpack around within the frame.
I hope you were able to learn how to use the Google Vision API in a novel way with OpenCV.