Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    General Processors in Document AI

    blog thumbnail

    Introduction

    Welcome to the future of documents! In this article, we will explore the general processors available in Google Cloud's Document AI. These pre-trained models are designed to process various documents efficiently and effectively, catering to a wide range of document processing needs.

    Overview of General Processors

    Currently, Document AI provides three general processors:

    1. Document OCR: This processor is designed to extract text from scanned documents, whether typed or handwritten. By submitting a scanned document to the Document OCR processor, you receive the extracted text in return. Even images of poor quality can be processed effectively to retrieve the needed information.

      Example of Document OCR Output

      On the left, you will find the extracted text from a scanned document on the right.

    2. Form Parser: If you're dealing with a scanned form and need to analyze the information contained in each field, the Form Parser is your go-to solution. While OCR captures the text, the Form Parser goes a step further, detecting form fields and returning the data as key-value pairs. It can also detect tables within the document, distinguishing between header and body rows for easier analysis.

    3. Document Quality: This processor assesses the quality of a scanned document, providing a score based on various defects. For example, it may output a low score for documents that are blurred, hard to read, or otherwise compromised in quality.

    Accessing Form Parser Information

    To better understand how the Form Parser organizes the extracted data, we can look at a sample output. The form fields are stored as an array of key-value pairs within the document.pages.form_fields field. Each key corresponds to field names and values extracted from the document. For example, if "Date" is extracted, the Form Parser might output entries for "Date" and its corresponding value, such as "September 14, 2019."

    Additionally, if your document contains tables, you can access this information through the tables field, which includes the detected tables on the page. The first row is usually recognized as the header row, followed by body rows that contain the actual data.

    Implementing Key-Value Pair Extraction with Python

    Here's a simple example of how to extract key-value pairs from a document object using Python:

    ## Introduction
    response = documentai_v1beta3.DocumentUnderstandingServiceClient().process_document(request=("name": processor_name))
    
    for page in response.document.pages:
        for form_field in page.form_fields:
            field_name = form_field.field_name.text
            field_value = form_field.field_value.text
            confidence = form_field.confidence
    
            print(f"Field Name: (field_name), Field Value: (field_value), Confidence: (confidence)")
    

    The output will show the extracted keys and values, along with their confidence scores.

    Conclusion

    In this exploration of general processors, we learned about the capabilities of the Document OCR, Form Parser, and Document Quality processors. These tools can provide detailed structured information from your documents effectively.

    Next time, we'll delve into specialized processors that can extract even more nuanced information from common document types. If you are eager to experiment with the Form Parser, you can follow the code lab linked in the description below.

    For further reading, please check out the documentation on general processors.


    Keywords

    Document AI, general processors, Document OCR, Form Parser, Document Quality, text extraction, key-value pairs, tables processing.


    FAQ

    Q1: What are general processors in Document AI?
    A1: General processors are pre-trained models that help in extracting and analyzing information from various document types. Currently, there are three general processors: Document OCR, Form Parser, and Document Quality.

    Q2: How does the Document OCR processor work?
    A2: The Document OCR processor extracts text from scanned documents, which can be either typed or handwritten. It is capable of processing images even if they are of poor quality.

    Q3: What does the Form Parser do?
    A3: The Form Parser detects form fields in a scanned document and returns the extracted data as key-value pairs. It can also process tables and categorize header and body rows.

    Q4: How can I assess the quality of a scanned document?
    A4: The Document Quality processor evaluates scanned documents and provides a quality score based on identified defects such as blurriness or legibility issues.

    Q5: Can I use Python to extract data from documents using Document AI?
    A5: Yes, you can use Python to implement the extraction of key-value pairs from a document object processed by the Form Parser, as demonstrated with the provided code snippet.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like