• November 30, 2022
Solving the Problem of Unstructured Data in Automation

One of the hurdles that has impaired the adoption of intelligent automation at scale for many organizations is the sheer amount of unstructured data that underpins many business processes. Texts, CSVs, video and audio files, PDFs, contracts, email and more are all examples of unstructured formats in which, by some estimates, up to 85 percent of an average company’s data exists. Being able to extract information that lives in unstructured formats is challenging for RPA and intelligent automation systems.

Boston-based Indico Data has been focused on automating document-based workflows that incorporate unstructured data since it was founded in 2014 just as deep learning had evolved as a viable artificial intelligence technology.

Co-founders Slater Victoroff and Madison May met as engineering students and recognized that deep learning had the potential to solve a challenging category of problems for businesses that require AI to understand meaning or intent. They also recognized that, while the potential was enormous, deep learning was very difficult to use effectively, so its practical applications had been limited.

After winning several Kaggle competitions the pair launched Indico Data based on a deep learning technique known as Transfer Learning, leveraging a new approach to solve an old problem—how much information it takes to train an AI-based system.

According to Tom Wilde, the company’s CEO who was brought on in 2017 to shepherd Indico Data’s revolutionary product to market, “input-output” is a well-known but persistent problem in the artificial intelligence and machine learning world.

“Historically, the only way to make AI work was to feed it tremendous amounts of training data,” Wilde notes. “It really has been one of the big impediments to the wide adoption of artificial intelligence. The team’s breakthrough was they had created the ability for users to generate custom machine learning models with very small amounts of training data.”

If you’re going to use a relatively small amount of data to train an AI-based system, however, the data better be of high quality, Wilde says. And the best place to find high-quality training data was with the humans closest to the manual processes and workflows—employees working directly with the largely unstructured data that any automation technology would need to access.

That insight led to a “human AI and machine teaching” approach that enables businesses to address unstructured data, unlike rules- or template-based approaches that can’t easily handle unstructured inputs without breaking.

“The premise was, we need to focus on the subject matter experts,” Wilde explains. “The people closest to the data and the documents are the people who should be creating this training data. So, we knew we had to build an application experience that was point-and-click friendly to a business user. The big insight is marrying the technology disruption with a consumer-friendly application experience for enterprise business users. That has made all the difference.”

Finding Growth

The Boston-area company has spent the last several years marshalling resources and honing its go-to-market strategy, building a foundation it hopes will catapult it forward along with the tailwind the Covid-19 pandemic has provided automation, Al/ML, and analytics technology.

With the basis of a platform established, Wilde says Indico Data had to identify where its product would make the most impact. It found enterprises that had committed to RPA—with hundreds of manual processes and mountains of unstructured data RPA couldn’t touch—provided a sweet spot.

“We found the people that really react to our product favorably are those who have had some exposure to automation through RPA,” he notes. “Often, that comes in the form of RPA Centers of Excellence at large enterprises. Sometimes it’s lines of business where unstructured data is really vital to how they execute their function. Think of insurance claims in banking, or trade processing in capital markets. In commercial real estate, it’s leases. These are examples where unstructured data is the primary problem that they have to focus on.”

Despite the challenges Covid presented, the company was able to secure working capital when it landed a $22 million Series B funding round enabling continued development of its platform. Over the course of a very tough 2020 for a wide range of businesses, RPA and intelligent automation businesses have thrived. Wilde notes that recent quarters have been among the company’s most successful for sales as automation technology providers are in high demand.

“The pandemic has definitely been a catalyst. It really has been a human tragedy and has been a great disruption to all our lives, personally and professionally,” Wilde says. “But one of the outcomes of the pandemic was a sharp focus by organizations on automation as a way to harden and improve the operating resilience and stability of a big company. So, automation as a category has seen huge focus from the enterprise and the investment community, and we were no exception.”

Moving Forward

Often, much like with RPA, companies that experimented with AI were disappointed when trying to scale the technology. Wilde says building functional algorithms and models isn’t the hard part—even though that’s where much of the emphasis is when companies adopt the technology.

“The problem is, that doesn’t get you production,” he explains. “You have to think about how to deploy, how to scale, how to govern, how to explain it to compliance folks. And that’s really where an application is required. To address this, we built a vertically integrated solution for building custom machine learning models and paired it with a complete deployment solution. We think we are the leaders in terms of the technical approach to this problem, but also in terms of getting our customers to capture these gains in efficiency and ROI and harvest this huge source of unstructured data to accomplish this.”

Wilde says companies will use Indico’s platform not only to help them generate structured data from mountains of unstructured data, but to help derive insights from that data and use it to change decision making.


If you liked this article, please sign up to RPA Today!  Registrants will receive our free weekly RPA newsletter updating you on the most recent developments in the Robotic Process Automation, Intelligent Automation and AI space. In addition to news updates, we will also provide feature articles (like this one) with a more in-depth examination of RPA issues for end users and their enterprises.