Https Www.Measurethat.Net Benchmarks Show 126 0 Concat-String-Vs-Array-Push

Last updated on July seven, 2021.

In this tutorial y'all will learn how to utilize OpenCV to detect text in natural scene images using the Eastward text detector.

OpenCV's EAST text detector is a deep learning model, based on a novel architecture and training design. It is capable of (one) running at most real-time at xiii FPS on 720p images and (two) obtains state-of-the-art text detection accuracy.

In the remainder of this tutorial yous will learn how to employ OpenCV's EAST detector to automatically detect text in both images and video streams.

To find how to use text detection with OpenCV, simply keep reading!

Update July 2021: Added 2 new sections, including alternative E text detector implementations, likewise every bit a department on alternatives to the EAST model itself.

Looking for the source lawmaking to this post?

Jump Right To The Downloads Section

OpenCV Text Detection (E text detector)

In this tutorial, you lot will larn how to apply OpenCV to detect text in images using the EAST text detector.

The East text detector requires that we are running OpenCV 3.4.2 or OpenCV four on our systems — if you lot exercise not already accept OpenCV 3.four.2 or amend installed, please refer to my OpenCV install guides and follow the one for your respective operating system.

In the first function of today'southward tutorial, I'll discuss why detecting text in natural scene images tin be so challenging.

From there I'll briefly talk over the EAST text detector, why nosotros employ information technology, and what makes the algorithm and so novel — I'll besides include links to the original paper so you can read up on the details if you are so inclined.

Finally, I'll provide my Python + OpenCV text detection implementation so y'all can start applying text detection in your own applications.

Why is natural scene text detection so challenging?

**Figure 1:** Examples of natural scene images where text detection is challenging due to lighting conditions, image quality, and non-planar objects (Figure 1 of Mancas-Thillou and Gosselin).

Detecting text in constrained, controlled environments tin can typically be achieved by using heuristic-based approaches, such as exploiting slope information or the fact that text is typically grouped into paragraphs and characters announced on a straight line. An instance of such a heuristic-based text detector tin exist seen in my previous weblog postal service on Detecting auto-readable zones in passport images.

Natural scene text detection is different though — and much more challenging.

Due to the proliferation of cheap digital cameras, and not to mention the fact that nearly every smartphone now has a photographic camera, we need to be highly concerned with the conditions the image was captured under — and furthermore, what assumptions we tin and cannot make. I've included a summarized version of the natural scene text detection challenges described by Celine Mancas-Thillou and Bernard Gosselin in their excellent 2017 paper, Natural Scene Text Understanding below:

Image/sensor racket: Sensor racket from a handheld camera is typically higher than that of a traditional scanner. Additionally, low-priced cameras will typically interpolate the pixels of raw sensors to produce real colors.
Viewing angles: Natural scene text can naturally have viewing angles that are non parallel to the text, making the text harder to recognize.
Blurring: Uncontrolled environments tend to have blur, peculiarly if the end user is utilizing a smartphone that does not have some form of stabilization.
Lighting conditions: We cannot make whatever assumptions regarding our lighting conditions in natural scene images. It may be near nighttime, the flash on the camera may be on, or the sun may be shining brightly, saturating the entire paradigm.
Resolution: Not all cameras are created equal — we may exist dealing with cameras with sub-par resolution.
Non-paper objects: Most, merely not all, newspaper is not reflective (at to the lowest degree in context of newspaper you are trying to browse). Text in natural scenes may exist reflective, including logos, signs, etc.
Non-planar objects: Consider what happens when you wrap text around a canteen — the text on the surface becomes distorted and deformed. While humans may still be able to easily "detect" and read the text, our algorithms volition struggle. We demand to be able to handle such use cases.
Unknown layout: We cannot employ any a priori information to give our algorithms "clues" every bit to where the text resides.

As we'll acquire, OpenCV'due south text detector implementation of EAST is quite robust, capable of localizing text even when it'south blurred, reflective, or partially obscured:

**Figure 2:** OpenCV's EAST scene text detector will detect even in blurry and obscured images.

I would propose reading Mancas-Thillou and Gosselin's work if you are further interested in the challenges associated with text detection in natural scene images.

The EAST deep learning text detector

**Figure 3:** The structure of the EAST text detection Fully Convolutional Network (Figure 3 of Zhou et al.).

With the release of OpenCV 3.four.2 and OpenCV 4, we can at present use a deep learning-based text detector called East, which is based on Zhou et al.'s 2017 paper, East: An Efficient and Accurate Scene Text Detector.

We phone call the algorithm "E" because it's an: Efficient and Accurate Southcene Text detection pipeline.

The E pipeline is capable of predicting words and lines of text at capricious orientations on 720p images, and furthermore, can run at 13 FPS, according to the authors.

Perhaps almost importantly, since the deep learning model is stop-to-terminate, information technology is possible to sidestep computationally expensive sub-algorithms that other text detectors typically apply, including candidate assemblage and give-and-take partition.

To build and train such a deep learning model, the EAST method utilizes novel, carefully designed loss functions.

For more details on E, including architecture design and grooming methods, be sure to refer to the publication past the authors.

Project structure

To start, be sure to take hold of the source lawmaking + images to today's post by visiting the "Downloads" section. From there, simply utilise the tree terminal command to view the project construction:

$ tree --dirsfirst . ├── images │   ├── car_wash.png │   ├── lebron_james.jpg │   └── sign.jpg ├── frozen_east_text_detection.lead ├── text_detection.py └── text_detection_video.py  1 directory, 6 files

Notice that I've provided three sample pictures in the images/ directory. You may wish to add together your own images collected with your smartphone or ones you find online.

We'll exist reviewing two .py files today:

text_detection.py : Detects text in static images.
text_detection_video.py : Detects text via your webcam or input video files.

Both scripts make use of the serialized Due east model (frozen_east_text_detection.atomic number 82) provided for your convenience in the "Downloads."

Implementation notes

The text detection implementation I am including today is based on OpenCV'southward official C++ example; however, I must admit that I had a scrap of trouble when converting it to Python.

To start, there are no Point2f and RotatedRect functions in Python, and because of this, I could not 100% mimic the C++ implementation. The C++ implementation tin can produce rotated bounding boxes, but unfortunately the one I am sharing with you today cannot.

Secondly, the NMSBoxes function does not return any values for the Python bindings (at least for my OpenCV 4 pre-release install), ultimately resulting in OpenCV throwing an fault. The NMSBoxes part may piece of work in OpenCV 3.four.2 simply I wasn't able to exhaustively exam it.

I got around this effect my using my own non-maxima suppression implementation in imutils, but again, I don't believe these ii are 100% interchangeable equally information technology appears NMSBoxes accepts additional parameters.

Given all that, I've tried my best to provide you with the best OpenCV text detection implementation I could, using the working functions and resource I had. If you have any improvements to the method please do feel free to share them in the comments beneath.

Implementing our text detector with OpenCV

Before we get started, I want to point out that y'all volition need at least OpenCV 3.4.2 (or OpenCV 4) installed on your organization to utilize OpenCV's EAST text detector, then if you oasis't already installed OpenCV three.4.2 or better on your system, delight refer to my OpenCV install guides.

Next, make sure you have imutils installed/upgraded on your organisation likewise:

$ pip install --upgrade imutils

At this point your system is now configured, so open up text_detection.py and insert the following lawmaking:

# import the necessary packages from imutils.object_detection import non_max_suppression import numpy as np import argparse import time import cv2  # construct the statement parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", type=str, 	help="path to input epitome") ap.add_argument("-east", "--east", type=str, 	assist="path to input Eastward text detector") ap.add_argument("-c", "--min-confidence", type=bladder, default=0.five, 	assist="minimum probability required to audit a region") ap.add_argument("-westward", "--width", type=int, default=320, 	help="resized prototype width (should be multiple of 32)") ap.add_argument("-e", "--height", type=int, default=320, 	help="resized image superlative (should be multiple of 32)") args = vars(ap.parse_args())

To begin, we import our required packages and modules on Lines ii-vi. Notably nosotros import NumPy, OpenCV, and my implementation of non_max_suppression from imutils.object_detection .

We then proceed to parse five control line arguments on Lines ix-20:

--paradigm : The path to our input image.
--e : The EAST scene text detector model file path.
--min-confidence : Probability threshold to determine text. Optional with default=0.5 .
--width : Resized epitome width — must be multiple of 32. Optional with default=320 .
--summit : Resized image height — must be multiple of 32. Optional with default=320 .

Of import: The E text requires that your input image dimensions be multiples of 32, so if you cull to adjust your --width and --height values, make certain they are multiples of 32!

From at that place, permit's load our image and resize it:

# load the input prototype and grab the image dimensions image = cv2.imread(args["image"]) orig = paradigm.re-create() (H, W) = image.shape[:two]  # fix the new width and meridian and and so make up one's mind the ratio in modify # for both the width and meridian (newW, newH) = (args["width"], args["tiptop"]) rW = West / bladder(newW) rH = H / float(newH)  # resize the image and grab the new paradigm dimensions prototype = cv2.resize(epitome, (newW, newH)) (H, W) = prototype.shape[:two]

On Lines 23 and 24, we load and copy our input image.

From there, Lines xxx and 31 determine the ratio of the original paradigm dimensions to new epitome dimensions (based on the command line argument provided for --width and --meridian ).

And so nosotros resize the image, ignoring aspect ratio (Line 34).

In order to perform text detection using OpenCV and the EAST deep learning model, we demand to extract the output feature maps of two layers:

# define the two output layer names for the EAST detector model that # we are interested -- the first is the output probabilities and the # second can be used to derive the bounding box coordinates of text layerNames = [ 	"feature_fusion/Conv_7/Sigmoid", 	"feature_fusion/concat_3"]

We construct a listing of layerNames on Lines 40-42:

The offset layer is our output sigmoid activation which gives us the probability of a region containing text or not.
The 2d layer is the output feature map that represents the "geometry" of the image — nosotros'll exist able to use this geometry to derive the bounding box coordinates of the text in the input prototype

Let'south load the OpenCV'southward EAST text detector:

# load the pre-trained E text detector impress("[INFO] loading East text detector...") net = cv2.dnn.readNet(args["east"])  # construct a blob from the image and then perform a forward pass of # the model to obtain the 2 output layer sets blob = cv2.dnn.blobFromImage(image, 1.0, (Westward, H), 	(123.68, 116.78, 103.94), swapRB=Truthful, ingather=False) commencement = time.time() net.setInput(blob) (scores, geometry) = net.frontward(layerNames) end = time.fourth dimension()  # testify timing information on text prediction print("[INFO] text detection took {:.6f} seconds".format(end - start))

We load the neural network into memory using cv2.dnn.readNet by passing the path to the Due east detector (contained in our control line args dictionary) as a parameter on Line 46.

Then nosotros gear up our prototype by converting it to a hulk on Lines 50 and 51. To read more than virtually this stride, refer to Deep learning: How OpenCV'south blobFromImage works.

To predict text nosotros can simply set the blob every bit input and telephone call cyberspace.forward (Lines 53 and 54). These lines are surrounded by grabbing timestamps and so that we tin print the elapsed time on Line 58.

By supplying layerNames equally a parameter to internet.forward, we are instructing OpenCV to return the two feature maps that nosotros are interested in:

The output geometry map used to derive the bounding box coordinates of text in our input images
And similarly, the scores map, containing the probability of a given region containing text

We'll need to loop over each of these values, one-by-i:

# take hold of the number of rows and columns from the scores volume, so # initialize our ready of bounding box rectangles and corresponding # conviction scores (numRows, numCols) = scores.shape[2:iv] rects = [] confidences = []  # loop over the number of rows for y in range(0, numRows): 	# extract the scores (probabilities), followed by the geometrical 	# data used to derive potential bounding box coordinates that 	# environs text 	scoresData = scores[0, 0, y] 	xData0 = geometry[0, 0, y] 	xData1 = geometry[0, one, y] 	xData2 = geometry[0, 2, y] 	xData3 = geometry[0, 3, y] 	anglesData = geometry[0, 4, y]

We showtime off past grabbing the dimensions of the scores volume (Line 63) and then initializing ii lists:

rects : Stores the bounding box (x, y)-coordinates for text regions
confidences : Stores the probability associated with each of the bounding boxes in rects

We'll later be applying non-maxima suppression to these regions.

Looping over the rows begins on Line 68.

Lines 72-77 excerpt our scores and geometry information for the current row, y.

Adjacent, we loop over each of the column indexes for our currently selected row:

            # loop over the number of columns 	for x in range(0, numCols): 		# if our score does not accept sufficient probability, ignore it 		if scoresData[x] < args["min_confidence"]: 			go on  		# compute the offset factor equally our resulting feature maps will 		# exist 4x smaller than the input image 		(offsetX, offsetY) = (x * 4.0, y * four.0)  		# excerpt the rotation angle for the prediction and so 		# compute the sin and cosine 		angle = anglesData[x] 		cos = np.cos(angle) 		sin = np.sin(bending)  		# apply the geometry volume to derive the width and height of 		# the bounding box 		h = xData0[x] + xData2[ten] 		west = xData1[x] + xData3[x]  		# compute both the starting and ending (x, y)-coordinates for 		# the text prediction bounding box 		endX = int(offsetX + (cos * xData1[10]) + (sin * xData2[x])) 		endY = int(offsetY - (sin * xData1[10]) + (cos * xData2[x])) 		startX = int(endX - west) 		startY = int(endY - h)  		# add the bounding box coordinates and probability score to 		# our respective lists 		rects.append((startX, startY, endX, endY)) 		confidences.append(scoresData[10])

For every row, we begin looping over the columns on Line eighty.

We need to filter out weak text detections by ignoring areas that practise not take sufficiently loftier probability (Lines 82 and 83).

The EAST text detector naturally reduces volume size as the epitome passes through the network — our volume size is actually 4x smaller than our input epitome and then we multiply by iv to bring the coordinates dorsum into respect of our original image.

I've included how you lot can extract the angle data on Lines 91-93; notwithstanding, as I mentioned in the previous department, I wasn't able to construct a rotated bounding box from it as is performed in the C++ implementation — if you feel like tackling the job, starting with the bending on Line 91 would be your first stride.

From there, Lines 97-105 derive the bounding box coordinates for the text area.

Nosotros and then update our rects and confidences lists, respectively (Lines 109 and 110).

We're almost finished!

The last step is to apply non-maxima suppression to our bounding boxes to suppress weak overlapping bounding boxes and then display the resulting text predictions:

# employ not-maxima suppression to suppress weak, overlapping bounding # boxes boxes = non_max_suppression(np.assortment(rects), probs=confidences)  # loop over the bounding boxes for (startX, startY, endX, endY) in boxes: 	# scale the bounding box coordinates based on the respective 	# ratios 	startX = int(startX * rW) 	startY = int(startY * rH) 	endX = int(endX * rW) 	endY = int(endY * rH)  	# draw the bounding box on the epitome 	cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), two)  # show the output image cv2.imshow("Text Detection", orig) cv2.waitKey(0)

As I mentioned in the previous section, I could non use the non-maxima suppression in my OpenCV 4 install (cv2.dnn.NMSBoxes ) as the Python bindings did not return a value, ultimately causing OpenCV to error out. I wasn't fully able to test in OpenCV 3.iv.ii and so it may work in v3.4.ii.

Instead, I have used my not-maxima suppression implementation bachelor in the imutils package (Line 114). The results all the same look good; however, I wasn't able to compare my output to the NMSBoxes part to meet if they were identical.

Lines 117-126 loop over our bounding boxes , scale the coordinates back to the original paradigm dimensions, and depict the output to our orig image. The orig prototype is displayed until a key is pressed (Lines 129 and 130).

As a final implementation notation I would like to mention that our two nested for loops used to loop over the scores and geometry volumes on Lines 68-110 would be an excellent example of where y'all could leverage Cython to dramatically speed up your pipeline. I've demonstrated the power of Cython in Fast, optimized 'for' pixel loops with OpenCV and Python.

OpenCV text detection results

Are you prepare to use text detection to images?

First past grabbing the "Downloads" for this web log postal service and unzip the files.

From there, you lot may execute the following command in your last (taking notation of the two command line arguments):

$ python text_detection.py --image images/lebron_james.jpg \ 	--due east frozen_east_text_detection.pb [INFO] loading East text detector... [INFO] text detection took 0.142082 seconds

Your results should look similar to the following prototype:

**Figure 4:** Famous basketball role player, Lebron James' jersey text is successfully recognized with OpenCV and Due east text detection.

Three text regions are identified on Lebron James.

Now allow's try to detect text of a concern sign:

$ python text_detection.py --image images/car_wash.png \ 	--east frozen_east_text_detection.lead [INFO] loading EAST text detector... [INFO] text detection took 0.142295 seconds

**Figure five:** Text is hands recognized with Python and OpenCV using Eastward in this natural scene of a car wash station.

And finally, we'll try a road sign:

$ python text_detection.py --image images/sign.jpg \ 	--east frozen_east_text_detection.atomic number 82 [INFO] loading Eastward text detector... [INFO] text detection took 0.141675 seconds

**Figure half dozen:** Scene text detection with Python + OpenCV and the Due east text detector successfully detects the text on this Spanish cease sign.

This scene contains a Castilian stop sign. The give-and-take, "ALTO" is correctly detected past OpenCV and EAST.

As you can tell, E is quite accurate and relatively fast taking approximately 0.fourteen seconds on average per epitome.

Text detection in video with OpenCV

At present that we've seen how to observe text in images, permit's move on to detecting text in video with OpenCV.

This explanation volition be very brief; please refer to the previous section for details every bit needed.

Open upward text_detection_video.py and insert the post-obit code:

# import the necessary packages from imutils.video import VideoStream from imutils.video import FPS from imutils.object_detection import non_max_suppression import numpy as np import argparse import imutils import fourth dimension import cv2

We brainstorm past importing our packages. Nosotros'll be using VideoStream to access a webcam and FPS to benchmark our frames per 2nd for this script. Everything else is the aforementioned as in the previous section.

For convenience, let's define a new office to decode our predictions function — it will exist reused for each frame and make our loop cleaner:

def decode_predictions(scores, geometry): 	# take hold of the number of rows and columns from the scores volume, and so 	# initialize our set of bounding box rectangles and corresponding 	# conviction scores 	(numRows, numCols) = scores.shape[ii:4] 	rects = [] 	confidences = []  	# loop over the number of rows 	for y in range(0, numRows): 		# extract the scores (probabilities), followed by the 		# geometrical information used to derive potential bounding box 		# coordinates that surround text 		scoresData = scores[0, 0, y] 		xData0 = geometry[0, 0, y] 		xData1 = geometry[0, one, y] 		xData2 = geometry[0, ii, y] 		xData3 = geometry[0, 3, y] 		anglesData = geometry[0, 4, y]  		# loop over the number of columns 		for x in range(0, numCols): 			# if our score does not take sufficient probability, 			# ignore it 			if scoresData[x] < args["min_confidence"]: 				go along  			# compute the offset cistron as our resulting feature 			# maps volition be 4x smaller than the input prototype 			(offsetX, offsetY) = (ten * 4.0, y * 4.0)  			# extract the rotation bending for the prediction and 			# and then compute the sin and cosine 			angle = anglesData[x] 			cos = np.cos(angle) 			sin = np.sin(bending)  			# apply the geometry book to derive the width and height 			# of the bounding box 			h = xData0[10] + xData2[ten] 			w = xData1[x] + xData3[x]  			# compute both the starting and ending (x, y)-coordinates 			# for the text prediction bounding box 			endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x])) 			endY = int(offsetY - (sin * xData1[10]) + (cos * xData2[x])) 			startX = int(endX - due west) 			startY = int(endY - h)  			# add the bounding box coordinates and probability score 			# to our respective lists 			rects.append((startX, startY, endX, endY)) 			confidences.append(scoresData[x])  	# return a tuple of the bounding boxes and associated confidences 	return (rects, confidences)

On Line 11 we define decode_predictions office. This part is used to excerpt:

The bounding box coordinates of a text region
And the probability of a text region detection

This dedicated part volition make the code easier to read and manage afterward on in this script.

Let's parse our control line arguments:

# construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-east", "--e", type=str, required=Truthful, 	help="path to input Due east text detector") ap.add_argument("-5", "--video", type=str, 	help="path to optinal input video file") ap.add_argument("-c", "--min-conviction", type=float, default=0.5, 	help="minimum probability required to inspect a region") ap.add_argument("-w", "--width", blazon=int, default=320, 	help="resized epitome width (should be multiple of 32)") ap.add_argument("-eastward", "--height", type=int, default=320, 	help="resized epitome elevation (should be multiple of 32)") args = vars(ap.parse_args())

Our command line arguments are parsed on Lines 69-80:

--east : The E scene text detector model file path.
--video : The path to our input video. Optional — if a video path is provided then the webcam will not be used.
--min-confidence : Probability threshold to determine text. Optional with default=0.5 .
--width : Resized image width (must be multiple of 32). Optional with default=320 .
--height : Resized prototype meridian (must be multiple of 32). Optional with default=320 .

The primary alter from the epitome-only script in the previous department (in terms of command line arguments) is that I've substituted the --epitome argument with --video.

Important: The Eastward text requires that your input image dimensions be multiples of 32, so if you choose to arrange your --width and --height values, ensure they are multiples of 32!

Next, we'll perform of import initializations which mimic the previous script:

# initialize the original frame dimensions, new frame dimensions, # and ratio betwixt the dimensions (West, H) = (None, None) (newW, newH) = (args["width"], args["acme"]) (rW, rH) = (None, None)  # define the two output layer names for the Due east detector model that # we are interested -- the first is the output probabilities and the # second can exist used to derive the bounding box coordinates of text layerNames = [ 	"feature_fusion/Conv_7/Sigmoid", 	"feature_fusion/concat_3"]  # load the pre-trained EAST text detector print("[INFO] loading Due east text detector...") net = cv2.dnn.readNet(args["due east"])

The elevation/width and ratio initializations on Lines 84-86 will allow us to properly scale our bounding boxes subsequently on.

Our output layer names are defined and nosotros load our pre-trained Eastward text detector on Lines 91-97.

The following cake sets up our video stream and frames per second counter:

# if a video path was non supplied, grab the reference to the web cam if not args.get("video", False): 	impress("[INFO] starting video stream...") 	vs = VideoStream(src=0).start() 	time.sleep(1.0)  # otherwise, take hold of a reference to the video file else: 	vs = cv2.VideoCapture(args["video"])  # starting time the FPS throughput estimator fps = FPS().start()

Our video stream is set up for either:

A webcam (Lines 100-103)
Or a video file (Lines 106-107)

From there we initialize our frames per second counter on Line 110 and brainstorm looping over incoming frames:

# loop over frames from the video stream while True: 	# grab the current frame, and so handle if we are using a 	# VideoStream or VideoCapture object 	frame = vs.read() 	frame = frame[1] if args.get("video", False) else frame  	# check to see if we have reached the end of the stream 	if frame is None: 		intermission  	# resize the frame, maintaining the aspect ratio 	frame = imutils.resize(frame, width=thousand) 	orig = frame.copy()  	# if our frame dimensions are None, we still need to compute the 	# ratio of sometime frame dimensions to new frame dimensions 	if W is None or H is None: 		(H, West) = frame.shape[:ii] 		rW = W / float(newW) 		rH = H / bladder(newH)  	# resize the frame, this time ignoring aspect ratio 	frame = cv2.resize(frame, (newW, newH))

We begin looping over video/webcam frames on Line 113.

Our frame is resized, maintaining aspect ratio (Line 124). From at that place, we grab dimensions and compute the scaling ratios (Lines 129-132). Nosotros then resize the frame again (must be a multiple of 32), this fourth dimension ignoring attribute ratio since we have stored the ratios for safe keeping (Line 135).

Inference and drawing text region bounding boxes accept place on the following lines:

            # construct a hulk from the frame and and then perform a forward pass 	# of the model to obtain the two output layer sets 	blob = cv2.dnn.blobFromImage(frame, ane.0, (newW, newH), 		(123.68, 116.78, 103.94), swapRB=True, ingather=False) 	net.setInput(blob) 	(scores, geometry) = cyberspace.forward(layerNames)  	# decode the predictions, then  apply non-maxima suppression to 	# suppress weak, overlapping bounding boxes 	(rects, confidences) = decode_predictions(scores, geometry) 	boxes = non_max_suppression(np.array(rects), probs=confidences)  	# loop over the bounding boxes 	for (startX, startY, endX, endY) in boxes: 		# scale the bounding box coordinates based on the corresponding 		# ratios 		startX = int(startX * rW) 		startY = int(startY * rH) 		endX = int(endX * rW) 		endY = int(endY * rH)  		# draw the bounding box on the frame 		cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), 2)

In this block we:

Detect text regions using EAST via creating a hulk and passing it through the network (Lines 139-142)
Decode the predictions and apply NMS (Lines 146 and 147). We apply the decode_predictions part defined previously in this script and my imutils non_max_suppression convenience function.
Loop over bounding boxes and draw them on the frame (Lines 150-159). This involves scaling the boxes past the ratios gathered earlier.

From at that place we'll close out the frame processing loop as well as the script itself:

            # update the FPS counter 	fps.update()  	# prove the output frame 	cv2.imshow("Text Detection", orig) 	primal = cv2.waitKey(ane) & 0xFF  	# if the `q` key was pressed, break from the loop 	if key == ord("q"): 		break  # stop the timer and brandish FPS information fps.stop() print("[INFO] elasped time: {:.2f}".format(fps.elapsed())) print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))  # if nosotros are using a webcam, release the pointer if not args.get("video", Fake): 	vs.stop()  # otherwise, release the file pointer else: 	vs.release()  # close all windows cv2.destroyAllWindows()

We update our fps counter each iteration of the loop (Line 162) and so that timings tin exist calculated and displayed (Lines 173-175) when we break out of the loop.

We bear witness the output of EAST text detection on Line 165 and handle keypresses (Lines 166-170). If "q" is pressed for "quit", nosotros suspension out of the loop and proceed to clean upward and release pointers.

Video text detection results

To use text detection to video with OpenCV, exist sure to use the "Downloads" section of this blog post.

From in that location, open upward a last and execute the following control (which volition fire up your webcam since we aren't supplying a --video via command line argument):

$ python text_detection_video.py --east frozen_east_text_detection.pb  [INFO] loading EAST text detector... [INFO] starting video stream... [INFO] elasped time: 59.76 [INFO] approx. FPS: 8.85

Our OpenCV text detection video script achieves 7-9 FPS.

This result is not quite as fast as the authors reported (thirteen FPS); yet, we are using Python instead of C++. By optimizing our for loops with Cython, we should be able to increase the speed of our text detection pipeline.

Alternative EAST text detection implementations

The EAST text detection model we used hither today is a TensorFlow implementation compatible with OpenCV, pregnant that you can utilize either TensorFlow or OpenCV to brand text detection predictions with this model.

If yous are looking for a PyTorch implementation, I suggest checking out this repo.

What other text detectors can we use as well EAST?

**Figure 7:** Culling text detection methods include Tesseract, EasyOCR, and utilizing traditional reckoner vision algorithms and techniques.

To start, both Tesseract and EasyOCR have both text detection (detecting where text is in an input epitome) and text recognition (OCR'ing the text itself):

This tutorial shows you how to utilise Tesseract to perform text detection
And this tutorial covers text detection with EasyOCR

Both of those tutorials apply deep learning-based models to perform text detection and localization.

However, depending on your project, you may be able to get away with using basic image processing and estimator vision techniques to perform text detection. The following tutorials bear witness you how to do exactly that:

Detecting machine-readable zones in passport images
Recognizing digits with OpenCV and Python
Credit card OCR with OpenCV and Python
Depository financial institution check OCR with OpenCV and Python (Role I)
Bank check OCR with OpenCV and Python (Part II)

While traditional estimator vision and prototype processing techniques may non be equally generalizable equally deep learning-based text detection techniques, they can work surprisingly well in some situations.

What's next? I recommend PyImageSearch University.

Course information:
35+ total classes • 39h 44m video • Last updated: April 2022
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if y'all had the right teacher y'all could primary reckoner vision and deep learning.

Practice you think learning computer vision and deep learning has to be fourth dimension-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That's not the case.

All you lot need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that'south exactly what I do. My mission is to change education and how complex Bogus Intelligence topics are taught.

If you lot're serious about learning estimator vision, your side by side stop should exist PyImageSearch University, the most comprehensive calculator vision, deep learning, and OpenCV grade online today. Here yous'll acquire how to successfully and confidently utilise computer vision to your work, research, and projects. Join me in estimator vision mastery.

Within PyImageSearch University you lot'll find:

✓ 35+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 35+ Certificates of Completion
✓ 39+ hours of on-demand video
✓ Make new courses released regularly , ensuring you tin keep upward with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev surroundings configuration required!)
✓ Access to centralized code repos for all 450+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Admission on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In today's blog mail, nosotros learned how to utilise OpenCV'due south new Due east text detector to automatically detect the presence of text in natural scene images.

The text detector is non only accurate, but it's capable of running in near real-fourth dimension at approximately xiii FPS on 720p images.

In club to provide an implementation of OpenCV'south Eastward text detector, I needed to convert OpenCV'due south C++ instance; notwithstanding, at that place were a number of challenges I encountered, such as:

Not being able to utilize OpenCV's NMSBoxes for non-maxima suppression and instead having to use my implementation from imutils .
Not being able to compute a true rotated bounding box due to the lack of Python bindings for RotatedRect .

I tried to keep my implementation equally close to OpenCV'south as possible, but proceed in mind that my version is not 100% identical to the C++ version and that there may exist one or two modest problems that will demand to be resolved over time.

In any instance, I hope you enjoyed today'south tutorial on text detection with OpenCV!

To download the source code to this tutorial, and start applying text detection to your own images, just enter your electronic mail address in the form below.

Download the Source Code and Free 17-page Resource Guide

Enter your email address below to go a .cypher of the lawmaking and a Gratis 17-page Resource Guide on Calculator Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you chief CV and DL!

browncompall.blogspot.com

Source: https://pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/