Numerous different industries are drawing on advances in IT to improve operational efficiency through the use of digital solutions. This also extends to paper-based procedures where data entry and checking of paper forms have imposed heavy workloads. Hitachi’s form recognition service utilizes AI to assess the accuracy with which a wide variety of fixed- and free-format forms have been scanned and read, thereby enhancing operational efficiency by providing a service platform on which its AI-OCR engine can perform highly accurate text recognition. This article describes the reduction in the amount of human work required for data entry in the processing of money transfers at a financial institution, and also dark data analysis for extracting the hidden value in business documents.
Innovation in digital technologies such as artificial intelligence (AI) and the Internet of Things (IoT) is bringing major changes to society. Hitachi’s new IT strategy involves an acceleration of its work toward the digitalization of all corners of society as a means of improving convenience for the public and enhancing efficiency in both the public and private sectors(1).
Meanwhile, paper forms such as invoices are still used to record and pass on information at government agencies, private-sector companies, and other organizations that engage in administrative work. The information on these forms is entered into computer systems for processing, meaning that when this work is done manually there is scope for improving efficiency by instead adopting AI-based optical character recognition (AI-OCR).
This article describes an existing OCR and AI-OCR that combines OCR and AI and presents an example of how operational efficiency has been boosted by use of Hitachi’s form recognition service that incorporates AI-OCR. The article also describes Hitachi’s plans for dark data analysis, a technique for extracting the value hidden in ordinary business documents.
OCR is a way of reading the text contained in image data. Similarly, AI-OCR is the application of AI for this purpose. While it is a form of OCR in the sense that it extracts text from image data, AI-OCR is distinguished by making use of AI in its recognition processing to overcome issues with conventional OCR.
What this means in practice is that it is able to read complex handwritten text (such as casual jottings, handwritten text that is not delineated by use of lined paper, or crossed-out text), forms such as invoices with layouts that vary from company to company, and loosely formatted documents such as contracts. Accuracy can also be further enhanced by use of AI learning. The technique has a strong affinity with robotic process automation (RPA) with which it can be combined to expand the scope of task automation at organizations where this is needed.
This section considers the future role of AI-OCR, the market for which is steadily growing, and the scope for its further development amid the shift to paperless practices as part of society-wide digitalization.
The market for AI-OCR is projected to increase from an estimated JPY700 million in FY2018 to JPY3.2 billion in FY2030(2). The two main reasons for this are as follows.
For these reasons, AI-OCR is expected to continue to play a part in future digitalization.
Fig. 1—Diagram of How Transition from OCR to AI-OCR Enables Use in Wider Range of Document Management Tasks By providing both technical and service enhancements, the transition from OCR to AI-OCR enables the technology to adapt to a variety of different task characteristics. Even as AI-OCR becomes established, however, it is anticipated that constraints imposed by existing practices and other such factors will see continued demand for the sort of fixed-format form scanning for which conventional OCR is used.
The transition from OCR to AI-OCR has expanded the range of document management tasks in which the technology can be put to use. Figure 1 shows how this works. AI-OCR has delivered service improvements as well as the technical improvements associated with advances in IT. Rather than having to obtain dedicated scanners specifically for seasonal work such as times of peak activity or work involving forms that are handled in small volumes or in many different formats, as was the case with conventional OCR, AI-OCR is delivered in a standardized form as a cloud service. A later section discusses the problem of unstructured data for which use of AI-OCR is difficult.
Fig. 2—Overview of AI-OCR-based Form Recognition Service The service is equipped with recognition engines for both fixed-format forms and free-format forms such as invoices that do not follow a predefined format. It uses the recognition technique that best suits the task and delivers very accurate results together with a confidence score indicating this recognition accuracy.
Hitachi has been developing OCR technologies since the practice first entered commercial use to present-day AI-OCR, and continues to do so with a view to the future. This work began in 1968 with the launch of the Hitachi H-8252 optical character reader, the first such general-purpose OCR system to be manufactured in Japan(3), (4). Development of the technology has continued, culminating in its current cloud-based AI-OCR service for scanning a wide range of business forms that utilizes deep learning and other such techniques. By drawing on business and technical know-how built up over many years, Hitachi aims to develop new services for the world of the future that will resolve the workplace challenges faced by its customers.
Hitachi currently supplies its form recognition service primarily to financial institutions, using it as a means to shift to paperless practices. Suitable for data entry tasks in a wide range of industries, the form recognition service features a service platform on which a number of different AI-OCR engines use AI to perform high-accuracy text recognition, with capabilities that include the scanning of fixed- and free-format forms, printed and handwritten text, and two-dimensional barcodes. It is also equipped with a proprietary Hitachi algorithm that calculates a confidence score for recognition accuracy, providing an easy way to identify data that may have been scanned incorrectly. These technical features smooth integration of the service with other business applications, making it possible to automate a wide range of forms processing work. Figure 2 shows an overview of the form recognition service.
Fig. 3—Technology-based Approach to Use of Form Recognition Service to Improve EfficiencyTo improve the efficiency of various types of form processing, the form recognition service uses AI-OCR and other such technologies to scan both fixed- and free-format forms with high recognition rates, also providing confidence scores to help reduce the amount of work required for checking the OCR output.
The ways in which the form recognition service can be used to improve efficiency can be broadly divided into the following two approaches.
The form recognition service overcomes problems with applications where use of OCR is difficult for technical reasons. Figure 3 shows an overview.
Fig. 4—Service-based Approach to Use of Form Recognition Service to Improve EfficiencyThe form recognition service is cloud-based to provide scalability in response to individual requirements, including seasonal work where there is a peak in workload or work involving forms that are handled in small volumes or in many different formats.
The form recognition service also overcomes the problem of applications where conventional OCR is usable but impractical for cost-benefit reasons. Figure 4 shows an overview.
Fig. 5—Example Application of Form Recognition Service (Efficiency Improvement at Processing Center) Form data entry at the processing center is performed manually by staff and the entered data is checked visually against the original form. The confidence score provided by the form recognition service (high, medium, or low) can be used to identify when manual data entry is or is not required.
The processing of money transfers is one of the three main forms of activity at a financial institution. It refers to the making of payments or other movements of money from one account to another without the use of cash, and may take various forms such as bank transfers, remittances, or the shifting of funds between accounts. It involves bank branches accepting various types of transfer forms from customers and passing them to a processing center where the data on the form is entered and data entry checked. Because these centers process such a large number of forms, high staff workloads and recruitment difficulties are among the issues they face. While they may deal with this by delegating money transfer processing to business process outsourcing services, this does not make the task of data entry and checking go away.
The form recognition service offers a way to investigate how to ensure business sustainability and cut overheads by reducing the workload for data entry and checking at processing centers. Figure 5 lists the issues and the benefits of adopting the form recognition service.
Forms processing is something that takes place not only at financial institutions, but also in numerous industries, such as the handling by government agencies of applications from the public or the freight industry’s processing of the various forms that it produces every day. The aim of the form recognition service is to support the processing of forms in different industries by scanning a wide variety of these forms and assessing the accuracy of the extracted information.
One of the applications where AI-OCR still struggles is the reading and analysis of loosely formatted business documents such as contracts or product catalogues (unstructured data).
A prerequisite for reading free-format forms using AI-OCR is that the extent of format variability is similar to that, for example, of the total amount field in an invoice, where there is a degree of uniformity in where this information is located, albeit with small differences between different companies. Accordingly, this sort of form is sometimes referred to as partially free format.
The term unstructured data is used for documents that lack the sort of structure found in a relational database, with no consistency in where fields are positioned relative to one another and where the same field can be expressed differently from one document to another, as is the case in many business documents. Examples of unstructured data include text, images, video, and audio.
Another key word is “dark data.” This refers to the value hidden in the business documents created in the course of corporate activity and is so called because approximately 80% of generated data is never re-used(5), (6). As a term, dark data has a broad meaning that encompasses both structured and unstructured data.
One example of the analysis of dark data might involve wanting to extract sales totals from financial reports made by different companies, a task that is complicated by the companies using different terminology, such as “construction revenue” or “total sales.” Despite these variations in terminology, it is possible to identify this information by generating sales total, financial period, and other feature values from the hierarchical formats (tables, columns, and so on) used in these financial reports.
Hitachi is looking at applying this new technology to unstructured data that is currently difficult to scan and analyze.
This article has described the efficiency improvements provided by a form recognition service based on AI-OCR. It is anticipated that applications for AI-OCR will go beyond the simple reading of text to encompass further operational efficiencies through integration with RPA. In the future, Hitachi intends to continue pursuing efficiency gains that bring innovation to the business workplace using technologies like AI-OCR.
Anyone interested in learning more about the form recognition service is urged to visit Hitachi’s Japanese website(7).