Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr

    blog thumbnail

    Introduction

    In this article, we will explore how to extract text from images using Python. There are several libraries available for text extraction, but we will focus on comparing three popular ones: pytesseract, easyocr, and keras_ocr. To demonstrate the capabilities of these libraries, we will use a dataset called Text OCR, which contains over a million annotations of text in images. This dataset is perfect for testing the performance of these libraries.

    Overview of the Dataset

    The Text OCR dataset consists of numerous images annotated with the text they contain. The dataset is organized into training and validation folders and includes several CSV and Parquet files that contain annotations, as well as metadata for each image. Each annotation includes a unique ID, associated image ID, bounding boxes for words, and the text itself.

    Setting Up the Environment

    For our experiments, we'll work within a Kaggle notebook, where we can leverage its built-in support for Python libraries. We will be importing essential libraries such as:

    import pandas as pd
    import numpy as np
    import glob
    from tqdm import tqdm
    import matplotlib.pyplot as plt
    from PIL import Image
    

    We'll read in the Parquet files containing annotations and image metadata, and use glob to retrieve the paths for the image files.

    Data Exploration

    Before diving into text extraction, we need to visualize some of the images in the dataset along with their annotations. This helps us understand what kind of images and text we are working with.

    Text Extraction Methods

    1. pytesseract

    The first method we will explore is pytesseract, a Python wrapper for Google's Tesseract-OCR Engine. Although pytesseract is widely used for document text extraction, it may not perform as well on diverse image types typically found in datasets like Text OCR.

    To use pytesseract, we can invoke the following command:

    import pytesseract
    
    text = pytesseract.image_to_string(image_file_name, lang='eng')
    print(text)
    

    After running this on an example image, we will analyze the output but note that the results may not be optimal.

    2. easyocr

    Next, we will test easyocr, which relies on deep learning models for text detection. It's slightly slower but often yields better results than traditional methods like pytesseract.

    To use easyocr, we create a reader object and invoke the read_text method:

    import easyocr
    
    reader = easyocr.Reader(['en'])
    results = reader.readtext(image_file_name)
    

    The output includes the detected text, bounding boxes, and confidence scores.

    3. keras_ocr

    The final library we will compare is keras_ocr, which combines a detector and recognizer under a unified pipeline. While keras_ocr is not pre-installed in Kaggle, we can easily install it using pip:

    !pip install keras-ocr
    

    Then we can run the text extraction as follows:

    import keras_ocr
    
    pipeline = keras_ocr.pipeline.Pipeline()
    results = pipeline.recognize([image_file_name])
    

    Comparing Results

    Having extracted text from images using all three methods, we will compare their performance. We will focus on key aspects such as accuracy, detection of bounding boxes, and any missing annotations.

    Visualization

    To visualize the results, we can use built-in tools from keras_ocr to draw annotations directly onto the images, allowing us to clearly see how well each library performed.

    We will create a function to facilitate the plotting of results side-by-side for a direct comparison.

    Conclusion

    We have explored three libraries for text extraction from images—pytesseract, easyocr, and keras_ocr—and analyzed their performance using a rich dataset. Each library has its strengths and weaknesses, and the choice of which to use may depend on the specific use case.


    Keywords

    • Python
    • Image processing
    • Text extraction
    • pytesseract
    • easyocr
    • keras_ocr
    • Optical Character Recognition (OCR)
    • Dataset
    • Annotations

    FAQ

    Q1: What is Optical Character Recognition (OCR)?
    A: OCR is a technology that converts different types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data.

    Q2: Which Python library should I choose for text extraction?
    A: The choice of library may depend on your project's specific requirements. For document-like texts, pytesseract might suffice. For a more diverse set of images, easyocr or keras_ocr may be preferable due to their better performance with complex backgrounds.

    Q3: What is the advantage of using deep learning-based OCR libraries?
    A: Deep learning-based libraries, like easyocr and keras_ocr, tend to be more accurate and robust in detecting text in a variety of fonts and styles, especially in challenging image conditions.

    Q4: Can I run these libraries on my local machine?
    A: Yes! You can install pytesseract, easyocr, and keras_ocr in your local Python environment. Just be sure to follow the installation instructions, especially for any dependencies.

    Q5: How does the performance differ between these libraries?
    A: Performance can vary based on the complexity of the images and text. It's suggested to test different libraries on your specific dataset to determine which provides the best results.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like