The compelling need for machine learning in the public sector

The compelling need for machine learning in the public sector

We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20 – 28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!


The sheer number of backlogs and delays in the public sector is troubling for an industry designed to serve voters. Last summer in the news was the four months waiting time to receive passports, significantly higher than the pre-pandemic standard of 6-8 weeks turnaround time. Most recently, the Internal Revenue Service (IRS) announced it has entered the 2022 tax season with: 15 times the usual amount of backlog of dossiers, in addition to the plan to make progress.

These often published backlogs do not exist due to a lack of effort. The industry has made headway with technological advancements over the past decade. Yet legacy technology and outdated processes still plague some of our country’s most prominent departments. Today’s agencies need to adopt digital transformation efforts aimed at reducing data backlogs, improving citizen response times and driving better agency results.

by embracing machine learning (ML) solutions and integrating improvements in natural language processing (NLP), backlogs can be a thing of the past.

How ML and AI can bridge the physical and digital worlds

Whether it’s tax documents or passport applications, manually processing items takes time and is prone to errors on the sending and receiving side. For example, a sender may accidentally check an incorrect box, or the recipient may interpret the number “5” as the letter “S”. This causes unforeseen delays in processing or, worse, inaccurate results.

But managing the growing problem of government documents and data backlogs is not as simple and straightforward as uploading information to processing systems. The sheer volume of citizen documents and information entering agencies is varied unstructured data formats and states, which are often difficult to read, make it nearly impossible to reliably and efficiently extract data for downstream decision-making.

Embracing artificial intelligence (AI) and machine learning in day-to-day government operations, as other industries have done in recent years, can provide the intelligence, agility, and edge needed to streamline processes and automate end-to-end document-centric enabling processes.

Government agencies need to understand that real change and lasting success will not come with rapid patchworks based on legacy optical character recognition (OCR) or alternative automation solutions, given the sheer volume of incoming data.

Bridging the physical and digital worlds can be achieved with Intelligent Document Processing (IDP), which uses proprietary ML models and human intelligence to classify and convert complex, human-readable document formats. PDFs, images, emails and scanned forms can all be turned into structured, machine-readable information using IDP. It does this with greater accuracy and efficiency than older alternatives or manual approaches.

In the case of the IRS, awash with millions of documents such as 1099 forms and individual W-2s, advanced ML models and IDP can automatically identify the digitized document, extract printed and handwritten text, and structure it in a machine-readable format. This automated approach speeds up processing times, takes human support where necessary, and is highly effective and accurate.

Advancing ML Efforts with NLP

In addition to automation and IDP, the introduction of ML and NLP technologies can significantly support the industry’s quest to improve processes and reduce backlogs. NLP is an area of ​​computer science that processes and understands text and spoken words as humans do, traditionally based on computational linguistics, statistics, and data science.

The field has made significant progress, such as the introduction of complex language models containing more than 100 billion parameters. These models can power many complex word processing tasks, such as classification, speech recognition, and machine translation. These improvements could support even greater data extraction in a world overrun with documents.

Looking ahead, NLP is on track to reach the level of text comprehension comparable to that of a human knowledge worker, thanks to technological advances driven by deep learning. Similar advances in deep learning also enable the computer to understand and process other human-readable content, such as images.

Specifically for the public sector, these may include images contained in disability claims or other forms or applications that consist of more than just text. These improvements could also improve the downstream stages of public sector processes, such as ML-based decision-making for agencies that determine unemployment benefits, Medicaid insurance, and other invaluable government services.

Not modernizing is no longer an option

While we’ve seen a handful of promising advancements in digital transformation, the call for system change has yet to be fully answered.

Ensuring that agencies go beyond patching and investing in various legacy systems is necessary to move forward today. Patchwork and investments in legacy processes do not support new use cases, are vulnerable to change, and cannot handle unexpected volume increases. Instead, it should be a no-brainer to introduce a flexible solution that can turn the most complex, hard-to-read documents from input into a result.

Why? Citizens earn more from the agencies that serve them.

CF Su is VP of machine learning at Hyperscience.

DataDecision makers

Welcome to the VentureBeat Community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

If you want to read about the very latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contribute an article of your own!

Read more from DataDecisionMakers