General Processors in Document AI

Introduction

Welcome to the future of documents! In this article, we will explore the general processors available in Google Cloud's Document AI. These pre-trained models are designed to process various documents efficiently and effectively, catering to a wide range of document processing needs.

Overview of General Processors

Currently, Document AI provides three general processors:

Document OCR: This processor is designed to extract text from scanned documents, whether typed or handwritten. By submitting a scanned document to the Document OCR processor, you receive the extracted text in return. Even images of poor quality can be processed effectively to retrieve the needed information.

On the left, you will find the extracted text from a scanned document on the right.
Form Parser: If you're dealing with a scanned form and need to analyze the information contained in each field, the Form Parser is your go-to solution. While OCR captures the text, the Form Parser goes a step further, detecting form fields and returning the data as key-value pairs. It can also detect tables within the document, distinguishing between header and body rows for easier analysis.
Document Quality: This processor assesses the quality of a scanned document, providing a score based on various defects. For example, it may output a low score for documents that are blurred, hard to read, or otherwise compromised in quality.

Accessing Form Parser Information

To better understand how the Form Parser organizes the extracted data, we can look at a sample output. The form fields are stored as an array of key-value pairs within the document.pages.form_fields field. Each key corresponds to field names and values extracted from the document. For example, if "Date" is extracted, the Form Parser might output entries for "Date" and its corresponding value, such as "September 14, 2019."

Additionally, if your document contains tables, you can access this information through the tables field, which includes the detected tables on the page. The first row is usually recognized as the header row, followed by body rows that contain the actual data.

Implementing Key-Value Pair Extraction with Python

Here's a simple example of how to extract key-value pairs from a document object using Python:

## Introduction
response = documentai_v1beta3.DocumentUnderstandingServiceClient().process_document(request=("name": processor_name))

for page in response.document.pages:
    for form_field in page.form_fields:
        field_name = form_field.field_name.text
        field_value = form_field.field_value.text
        confidence = form_field.confidence

        print(f"Field Name: (field_name), Field Value: (field_value), Confidence: (confidence)")

The output will show the extracted keys and values, along with their confidence scores.

Conclusion

In this exploration of general processors, we learned about the capabilities of the Document OCR, Form Parser, and Document Quality processors. These tools can provide detailed structured information from your documents effectively.

Next time, we'll delve into specialized processors that can extract even more nuanced information from common document types. If you are eager to experiment with the Form Parser, you can follow the code lab linked in the description below.

For further reading, please check out the documentation on general processors.

Keywords

Document AI, general processors, Document OCR, Form Parser, Document Quality, text extraction, key-value pairs, tables processing.

FAQ

Q1: What are general processors in Document AI?
A1: General processors are pre-trained models that help in extracting and analyzing information from various document types. Currently, there are three general processors: Document OCR, Form Parser, and Document Quality.

Q2: How does the Document OCR processor work?
A2: The Document OCR processor extracts text from scanned documents, which can be either typed or handwritten. It is capable of processing images even if they are of poor quality.

Q3: What does the Form Parser do?
A3: The Form Parser detects form fields in a scanned document and returns the extracted data as key-value pairs. It can also process tables and categorize header and body rows.

Q4: How can I assess the quality of a scanned document?
A4: The Document Quality processor evaluates scanned documents and provides a quality score based on identified defects such as blurriness or legibility issues.

Q5: Can I use Python to extract data from documents using Document AI?
A5: Yes, you can use Python to implement the extraction of key-value pairs from a document object processed by the Form Parser, as demonstrated with the provided code snippet.