Watching contents with the eyes of a human: the Document AI

5/4/2021

The Artificial Intelligence meant to translate for machines

As promised, our journey in the world of Machine Learning has reached the chapter regarding the Document AI. After explaining what is intended by the expression of machine readable and also why this specification is essential for NLP systems - which work only on plain text - now it’s time to discover what technologies make contents themselves accessible to machines.

Previous blog post on NLP

A multimodal approach to textual contents

Digital textual contents - as PDFs, for instance - always include two kinds of aspects: one graphical - visual - and another functional, structural, closely related to the content. The connection between these two elements does not generate difficulties for humans, who use sight to interface both with an image and with a text. For machines, indeed, specific techniques are required to transform visual contents into plain text. So, to briefly sum up, we can say that Document AI is the technology created to automatically simulate the human kind review of documents. To obtain this result, the Document Artificial Intelligence exploits a multimodal approach: one hand, it uses Natural Language Processing to handle plain text, while, on the other hand, it has to perform also an image processing activity - as it works on images at first - and then a text encoding one - in order to generate plain text, therefore a machine readable content.

Document AI, where machine readable formats are made

Once again, we are back to the heart of the matter: the computing power of machines can do nothing if it is not adequately supported by the typically human interpretative skill. That is why, starting from formats that are not usable by machines (we can quote PDFs again as the best known and used example), it is always necessary to transform them into standards that are machine readable. And to do this, it is also necessary to simulate what humans do, namely to switch from an image to a sequence of characters, up to a verbal content. Anyway, the fundamental difference is evident: while humans always use eyes as physical interface, in informatics - so for machines - a text is a different content compared to an image. That is to say that humans can contemporary take into account the conceptual content of a text and its graphical aspects, which characterize its structure (for instance, its formatting). This unique dynamic of human understanding necessarily needs to be reproduced in separate sequences in a Machine Learning context. And Document AI is specifically dedicated to automate this process.

High level encoding on regulatory texts: Aptus.AI

At Aptus.AI we deal with this. Our goal is to create an automated descriptive markup on textual contents which are represented as images and which, by their very nature, are not accessible to machines. Turning again to the Portable Document Format - the widely used PDF -, it is known that this format is read as an image, therefore it does not represent a sequence of characters. At Aptus.AI we have succeeded in integrating Document AI and Natural Language Processing, thus creating Daitomic. This financial compliance management AI solution makes machine readable documents which are not so at the origin (like financial regulations PDF files), also preserving the structure and the complexity of the regulatory text itself. This is why Daitomic will revolutionize the management of banking documents: get in touch with us to learn more!