Technical Debt, Data Debt, and AI

Most of us see debt as something to be avoided, so “technical debt” minimization has been a priority for development teams. Essentially, the term means an erosion in software quality caused by taking the expedient path, not taking enough time, or simple carelessness and errors. There’s also a growing interest in what many consider a subset of technical debt, which is “data debt”. What is it? The accumulation of bad practices and data errors that contaminate business management decisions. Enterprises see a rise in both technical and data debt, and see AI as a risk in both areas, but also tie both up in what they see as a larger problem.

When enterprises comment to me about “debt”, they tend to focus on what they see as a kind of IT populism, the increased direct use of IT-facilitating tools by line departments. CIOs and other IT professionals understand this; there’s pressure on line management to improve their operations, and often the time required to engage internal IT is seen as an issue. They also realize that as-a-service trends have made applications more accessible to line organizations’ staff. Yes, there is surely some IT parochialism involved here, but it does seem clear that the impact of what’s often called “citizen development” on technical and data debt has been under-appreciated.

Line organizations are parochial in their own thinking, by design. Companies are organized by role in the business, and you can’t have everyone running out to do others’ jobs without coordination. The same thing is true of development, of IT in any form. In the past, I’ve noted that things like cloud computing, particularly SaaS, and low-/no-code are most likely to be successful if there is some IT coordination, at least in the initial design, and particularly as a means of ensuring that organizations with interlocking activity don’t end up building their own silos.

Almost half of enterprises say that they impose little or no policy constraints on citizen developers, and almost two-thirds of this group say it’s not necessary. Of the over-half who do set constraints, the most common is a restriction on “chaining” applications, meaning having citizen developers write applications that run on the output of other such applications. However, it’s interesting to note that just over than a quarter of enterprises who set that constraint don’t constrain having citizen applications create databases or database records, and of course that can easily lead to the chaining they theoretically forbid. It’s also, I think, the source of a lot of serious data-debt risk.

Almost all enterprises say that it’s possible that citizen developers might create data that is redundant, contradictory, or flat incorrect. Often, some say, the duplicated data is in a different format from an IT source that the citizen developer didn’t know about. Five enterprises who set rigid control say that they had a major problem with data integrity that arose from use of low-/no-code tools, and now require an audit on such applications.

One area where data debt seems most likely relates to Office applications, spreadsheets and databases. Not only are these often passed around among workers, they are sometimes imported into major and even core applications. Spreadsheets were the big data-debt problem for four of the five enterprises who found it necessary to clamp down on citizen developer practices, but all four admit that they really have no way of knowing whether workers with Excel skills are conforming to policy. Half admit they suspect they are not.

How about AI? Only a few (less than one in ten) enterprises have considered the impact of AI on data debt, but all of them expressed some common concerns. The majority of them, while not necessarily spreadsheet-specific, are often related to spreadsheets.

One of the common value propositions for AI copilot technology involves assisting in the creation or analysis of spreadsheets, and this format is regularly used within line organizations for “casual” analysis of data. I’ve seen, in client companies, issues with what we’d now call “data debt” in Excel spreadsheets and Microsoft Access databases almost from the first, well before AI. But AI might well make things worse.

AI copilot technology used in development organizations is regularly characterized by enterprises as a “junior programmer”. They believe that results of AI code generation requires collaborative code review to prevent the classic technical debt problem. Surely the same sort of problem could happen with Office tools, and I’ve seen AI-assisted Word documents and AI research results that were truly awful in terms of quality. Could we expect our line worker, who obviously feels a need for assistance in the use of Excel, to understand the results and audit data quality? Obviously, no.

Enterprises almost never offer AI-linked comments on data debt at this point (which I think means any purported research on the topic has major risks), but remember that one of the long-standing complaints enterprises have offered on AI results is the difficulty associated with tracing the steps taken to get those results. Any given AI result could be a “hallucination”, and work to allow AI to retain context through complex analysis means chaining those results. Can we trust them? If there’s even a five percent error/hallucination rate in an AI analysis, the chances of getting accurate results from four chained analyses is less than one in four. And, would we know if that happened?

Data debt is a real risk, perhaps a greater risk than technical/code debt because of the “garbage in, garbage out” truth of IT. While there are surely benefits to AI, and to broader “citizen developer” participation, there doesn’t seem to be much doubt that both can contribute to data debt, and that would work against the business case for AI. You can’t improve company operations when the core data you use is being eroded in quality by the very mechanisms you’re relying on to make things better.

Email and RSS:

Our Commitment: All the Facts, Always the Truth