TDWI Articles

How Robotics Process Automation Eases Data Management

Programs that perform repetitive tasks could be useful for enterprises looking to automate data management tasks, from data cleansing and normalization to data wrangling and metadata management.

Robotic process automation (RPA) is gaining popularity as more companies understand its benefit for management and housekeeping in big data and artificial intelligence. Originally based on screen scraping from 3270 terminals, RPA has grown more sophisticated. It refers to programs providing analogues of manual processes. These scripts create robotic actions that automatically perform tasks, principally of a repetitive nature. They are, in essence, software robots.

For Further Reading:

Improving Data Accuracy with Robotic Process Automation

Can Automation Accelerate Machine Learning Programs?

5 Ways to Add Cognition to Your BI Program

RPA in the Data Center

RPA is a concern for IT because it needs to be set up and customized to perform data-intensive tasks across a range of deployments. It has been useful in many areas of business, from finance to retail. RPA and data management, however, make a particularly compelling combination.

In data management, intriguing new possibilities are emerging for employing RPA. Data management includes numerous repetitive tasks in aggregation and curation that can benefit from automation. Applying RPA to large data repositories makes tasks such as data cleansing, normalization, data wrangling, and creation or updating of metadata more efficient. All of these tasks are highly repetitive and also tend to be unique. Each data access situation demands special considerations. This provides an ideal opportunity for applying RPA.

RPA can be combined with other techniques to create sophisticated data handling solutions. One example is the use of RPA to extract information from OCR documents to create metadata and reduce content to a usable format for big data or machine learning processes. Other transformations of very large data sets that benefit from RPA include:

  • Data input, replacing manual keying or file submission
  • Migration of data between dissimilar data stores, such as data acquired from corporate acquisitions or mergers
  • Data combination, such as aggregation to provide richer data sources for advanced processing algorithms
  • Screening for irregularities and improving data quality through repetitive tasks
  • Enabling input and verification of manual processes (traditionally applied with RPA, but enhanced for larger data sources)
  • Data deduplication and data extraction applied to new data sources such as IoT machine logs and other system-generated raw data

This catalog will expand as RPA develops further. Although mainly a record-and-play technology, RPA will benefit as machine learning (ML) is added directly to processes to create a more dynamic response. Already, ML is used in external components of RPA solutions, such as NLP and OCR for documents and image processing for graphics.

Mining RPA Logs

RPA also provides a record of the transformations it undertakes. This can be extremely important for process optimization, regulatory compliance, and maintaining transparency in a complex data environment. RPA is also moving into analytics in the cloud, particularly in retail applications, but its potential in solving data management problems is of particular interest. It can be a powerful tool for maintaining data quality in the complex arena of big data and AI. The ability to analyze the RPA logs can help an enterprise improve its data quality and data extraction to determine where efficiency is lost or where processes are performing below expectations.

Mining RPA logs is likely to become critical as the need for more transparency into operations grows. Current fears about privacy and unanticipated or illegal use of records make it imperative to monitor automated processes. This is already a concern in financial technologies and other highly regulated industries.

Robots to the Rescue

RPA software robots, like their mechanical cousins, can be seen as threats to human jobs. However, as with industrial robots, the jobs replaced are repetitive and require fewer skills. RPA tends to reduce headcount in a static situation, but the data center is already suffering from a skills shortage in an increasingly complex and dynamic environment. Many tasks can no longer be handled manually as volume and velocity increase. RPA and AI in its various forms are already taking up the slack.

As the amount and types of data grow, RPA provides interesting solutions for improving results in all analytics areas. Augmented by machine learning and AI, RPA can streamline input (such as initial processing of images and documents) and improve processes through monitoring of RPA logs. RPA can reduce headcounts for manual and highly repetitive processes, and it can be used to improve data quality.

With all of these applications, it is easy to see how RPA is likely to become increasingly important in data management. As this technology develops and is used in diverse applications, its added functions will provide even greater advantages to data management solutions.

 

About the Author

Brian J. Dooley is an author, analyst, and journalist with more than 30 years' experience in analyzing and writing about trends in IT. He has written six books, numerous user manuals, hundreds of reports, and more than 1,000 magazine features. You can contact the author at bjdooley.query@yahoo.com.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.