Businesses depend on information. Information enables decision-making and is critical to record-keeping. Since the time of the Mesopotamians, people have recorded or “documented” information, first in analog and then digital forms. Business requires not just digital documentation but digital data. But much of digital data is in unstructured documents, especially in Finance operations. Processing unstructured finance documents has been a challenge. Because the extraction and conversion of data from analog and digital unstructured finance documents to structured digital data has, up to now, been painful. Intelligent Document Processing is changing that.
The earliest documents recording inventory and the movement of goods—accounting ledgers—were written in clay in proto-cuneiform, before the written language called Cuneiform. These clay documents had rows and columns of pictograms representing items with holes impressed to indicate quantity. That start of business documentation eventually developed into ink-and-paper and finally electronic ledgers.
With the advent of computers in the 20th century, companies needed to record documented information differently: digital data. For decades that has meant the tedious process of manually keying data into digital format on a computer. Intelligent document processing, also called IDP, is changing that.
To understand intelligent document processing, we must first understand what a document is. To make a record of something is to “document” it—to write it down in one way or another to create a record of information.
A document is recorded information. In the context of business, a document is a written record providing evidence or information, such as a bill of sale, bill of lading, invoice, or a legal or official paper such as a deed or title, certificate, passport, or license. A document can also be a letter, email, article, or book.
A document is also defined as “text, written and stored on a computer” and “any input containing text; text stored in any format.” So, documents can also be “digital.” But being on a computer does not necessarily make a document’s content (data) digital—it might be merely an electronic image of an analog document with the data inaccessible for manipulation or calculation.
Most records today are captured in computers. Computers even issue “documents” like purchase orders and invoices digitally or on paper. Ironically for a very long time, documents have been created electronically on a computer, then printed on paper to transmit to other parties, who then had to take the paper copy and rekey the data to recreate it electronically in their system.
Now companies “export” documents in various electronic formats for digital transmission to customers or vendors. Electronic documents include PDFs, TXT, image files (JPG, GIF, PNG), word-processing files like DOCX, text stored in databases and spreadsheets, EDI, emails, message texts and social media posts.
All these documents fall into one of three types:
- Structured documents
- Semi-structured documents
- Unstructured documents
The difference between these types is in the organization of the data in the document.
“Structured Documents” Have Structured Data
A structured document is one in which the data is organized so that a computer program can utilize it. That means the data is accessible to program algorithms and, critically, arrayed so that an algorithm knows where to find what. The data are categorized and comparable. Each data column, for example, contains the same kind of data, such as dollars, dates, or identification numbers. Databases and spreadsheets are structured documents.
Unstructured Documents: Valuable but Unordered Data
If documents are not in a database or spreadsheet format, they’re “unstructured.” An “Unstructured Document” is a document that may contain valuable data, but the data is not organized in a fixed format. Consequently, it is difficult to “find” and capture the data for use.
An email may contain important information, but it is presented in narrative text. Letters, emails, image files, text files, blogs, and social media posts are examples of unstructured documents.
Semi-structured Documents: Specific Data without a Common Format
Semi-structured documents can be distinguished from wholly unstructured documents in that they are “specific purpose” documents that include certain data characteristics. However, the presentation or display of that data is not universally consistent. Examples of semi-structured documents include purchase orders, invoices, receipts, licenses—documents with a specific purpose and predictable content, but an unpredictable layout that varies from organization to organization.
The Problem of Unstructured and Semi-Structured Documents
Process automation, data analysis, and data-driven decisions require the availability of structured digital data. But a lot of the data is unstructured. A standard view attributed to Gartner is that 80 percent of an enterprise’s data is unstructured. Companies face a huge volume of unstructured or semi-structured data. More of a company’s potentially valuable data is unstructured than structured.
In the case of financial operations, such as accounts payable and accounts receivable, the lack of a universal solution for transmitting data from one organization to another organization has stymied companies in the effort to automate financial processes. The problem is that sellers have many buyers, and buyers have many sellers. Only a few intersect in terms of having the same system that can allow data to flow electronically from one to the other without having to “de-digitize and re-digitize.” But automated processing requires data to be in a structured format. For most organizations, that requires them to manually input (keypunch) data from unstructured documents into a data file that can then be automatically processed by ERP or other automation.
In the exchange of information in business transactions, there are systems like EDI, used nearly exclusively by large enterprises with a narrow group of similarly large trading partners. A few proprietary networks require all parties to be part of the network. But even leading large enterprises that take advantage of these tools have many customers or vendors that do not share them.
The Solution: Intelligent Document Processing
Getting data off customer or vendor documents, whether analog or digital, and into one’s system to use the data, up to now, has required a significant manual effort. Much of the work in accounts receivable, for example, involves manual processes, preparing and converting information into digital data for the computer systems.
The purpose of intelligent document processing is to replace that manual effort. Intelligent document processing brings artificial intelligence to bear on the problem of accurately and quickly reading, extracting, and converting data in unstructured documents into structured data.
Companies now can automatically scan, identify, extract and organize data with artificial intelligence—including natural language processing, machine learning, and deep learning neural networks—. Initially, human-guided, intelligent document processing learns on its own over time to the point at which it can automatically convert 90 percent or more of data “locked” in documents into useable data.
Intelligent document processing is changing financial operations. It takes over the formerly manual tasks, freeing financial operations staff for more valuable and engaging work, from problem-solving to strategic thinking and planning.
There are many benefits of intelligent document processing, but primarily they are speed of data extraction and conversion—hundreds of documents in one minute versus one document in ten minutes manually—and a concurrent reduction of errors and cost.
Find out more about Intelligent Document Processing and Emagia’s Gia Docs.