March 28, 2023

Documents: An Unmanaged Risk


While the drive to an all-digital future continues, documents are still a fundamental part of many organisations’ processes.

Whether it’s application forms, surveys, certificates, or the seemingly unavoidable creation of spreadsheet after spreadsheet, colleagues and customers constantly exchange files. Documents are being downloaded, duplicated and reshared across numerous drives, mailboxes, and platforms, with little thought to the risks and costs involved.

How can businesses ensure that they apply the same rigour to documents as they do to databases, and really leverage the information they contain to enhance efficiency, income, compliance, and customer satisfaction?

Acknowledge the Risks

The disparity between the control applied to structured data in databases and unstructured data in documents is alarming, especially when considered in the context of data protection regulations, financial risk, and the risk to business processes.

Some of the most challenging risks to identify are those where data is making an unknown and unsanctioned journey through document drives. This could be as simple as a HR headcount spreadsheet, containing employee data, being stored on a drive and feeding into downstream processes. Ensuring items like this are catalogued and indexed is a key aspect of complying with data protection regulations, such as the right to be forgotten.

Retention policies are notoriously difficult to apply to documents, yet both regulatory and internal policies are often breached by documents containing personal and sensitive data. The complexity is in discovering where these documents are and what policies have been applied to them; the right software tools can greatly accelerate this journey and mitigate the risks.

Count the Cost

Leaving risks aside, the cost to a business of storing the same document multiple times soon escalates in both financial and environmental terms.

Common practices include iterating version numbers of the same document, downloading the same document multiple times or sending a copy of a document to multiple recipients. While solutions such as OneDrive and link-sharing have contributed to reducing this issue, old habits die hard. As with the duplication of data in a database, this practice requires unnecessary storage and the management of significant amounts of duplicated data at a cost to the business.

There is an energy and material overhead too. This overhead may seem to be small at a local level, but globally, accounting for the energy used by datacentres, the materials used for storage, and the lifespan of drives, there are significant environmental costs because of unnecessary document retention.

Quantify the Problem

To start improving document management, the state of the document space must first be understood. We need to be able to answer questions such as:

  • How many documents do we have?
  • Where are documents stored?
  • How much space is taken up by duplicate documents?
  • Are we retaining information that we should have disposed of?

Here again, the right software can help. The goal is to consolidate all the information that can be retrieved from documents into an accessible and manageable platform. This will allow a business to search for critical information, such as last modified date, document size, document title, author, and other metadata.

The second crucial element is being able to search not only the metadata, but also the contents of a document. There are several methodologies for this, but one that works well is condensing the document content and storing it alongside the metadata. Although this is further duplication of the information, a good data management tool can then interact with the information and provide transformational functionality.

Change the Future

Consolidating document data into a searchable format is half the journey; the other half is leveraging the technology to change the way documents are managed, understood and, most importantly, used to improve outcomes.

Once indexed, applying data-quality rules to the information retrieved from documents is a powerful capability. Processes that produce the most document issues can be redesigned, and documents that breach policies or regulations can be dealt with by the business.

The highly changeable nature of documents means interacting and monitoring must be as continuous and as automated as possible. This is normally achieved by regular scanning of file servers, mailboxes and other resources, and the application of dataquality rules to identify documents requiring action. It’s important to treat document discovery and management not as a single, one-time project, but as an ongoing component of the organisation’s overall data management strategy.

Ideally, document analysis is a two-way street. On the back of the resolution of issues, the business should set and enforce document standards. A tool that automatically issues these instructions increases confidence that document data adheres to policy and allows organisations to demonstrate that data is managed pro-actively.

By applying tried and trusted technologies, combined with the latest in algorithmic natural language processing, modern document management software can liberate businesses from the burden of manual document maintenance activities, and provide their processes with higher quality, more reliable data to drive the organisation forward.