AI-Powered Data Extraction: Training a Custom Model with AlgoDocs: Processing BOL documents

Introduction

In this article, we will discuss how to leverage AI-based data extraction by training a custom model using AlgoDocs, specifically focusing on bill of lading (BOL) documents. For this demonstration, we will use BOL documents derived from a real-world use case, meaning certain sections of the documents will be masked. However, our goal is to capture the visible data. The documents we work with are images taken by mobile devices, which implies various layouts and field positioning.

Overview

Our training model will capture various fields including:

Bill of Lading Number
Origin Terminal Number
Pro Number
Number of Pallets
Ship to City, State, and ZIP Code
Prepaid Options (selected or unselected)
Items Table (with specific columns: Weight, Commodity Description, and Class)

Step 1: Creating the Extractor

We start by creating an extractor, naming it "BOL Extractor." After selecting a sample document to process, we move to the extractor editor. AlgoDocs offers various extraction methods, and we choose the custom model option, which directs us to the AI custom model editor. The sample document we uploaded undergoes processing during this setup.

Step 2: Labeling Documents

To train our custom model successfully, it’s essential to label documents accurately. We need at least ten labeled documents before training. The initial document is already uploaded, so we select nine more files, leaving the last two aside for testing the model. It's recommended to label more than ten files for complex cases, as a larger dataset usually yields better accuracy.

Next, we create fields in the right pane. Each field requires a type designation: field, table, or selection mark. For example, "Prepaid" will classify as a selection mark while "Items" will be a table. We continue to add all necessary fields such as Bill of Lading Number, Origin Terminal Number, Pro Number, Number of Pallets, Ship to City, Ship to State, Ship to ZIP Code, Prepaid, and Items table.

Step 3: Document Labeling Process

The labeling process is critical as the AI model learns from accurate annotations. Labeling can be accomplished by clicking on the value of each field or selection mark. For the items table, we can opt for an auto-labeling feature that detects tables automatically.

Next, we assign the appropriate column names to the detected table while ignoring unnecessary header rows. This ensures that our model captures only item information essential for training. After completing the labeling process for all documents, we proceed to train our model by clicking the "Train" button.

Step 4: Post-Formatting and Training

After training initiation, we are taken back to the extractor editor. Here, we can apply formatting adjustments (e.g., removing unwanted commas from city names) while the model undergoes training, which can take up to an hour.

Step 5: Testing the Custom Model

Once the training is complete, we test the model with the previously set-aside files. Creating a new folder allows easy uploading of documents for processing. When we upload these files, the captured data appears under the extracted data section.

Following testing, we can export extracted data in formats such as Excel, JSON, or XML. We also maintain the option to review and correct any inaccuracies before finalizing the data extraction results. Moreover, AlgoDocs integrations can streamline automated file imports and data retrieval processes.

For any questions or support regarding this process, feel free to contact support at algodo.com.

Keywords

AI-based data extraction
Custom model training
Bill of lading (BOL) documents
Document labeling
Extraction methods
AlgoDocs
Data export formats
Integration automation

FAQ

1. What is AlgoDocs?
AlgoDocs is a platform offering advanced data extraction and automation solutions, primarily focusing on various types of documents.

2. How many documents do I need to train a custom model?
You need at least ten labeled documents to train the model effectively, although more is recommended for complex cases.

3. Can I adjust the extracted data after processing?
Yes, you can review and apply corrections to the extracted data if needed.

4. What formats can I export the extracted data into?
The extracted data can be exported in Excel, JSON, or XML formats.

5. What should I do if I need support while using AlgoDocs?
You can contact support at algodo.com for assistance with any questions or issues regarding the platform.