When a company decides to offer AI to customers or build internal AI tools, it’s not just a matter of deciding to implement some new software. You can’t just buy a subscription to an API and call it good. That one decision actually implicates numerous other decisions, and most of those decisions have nothing to do with AI itself.

They have to do with your data.

Inventory

Before a single model gets trained or a single prompt gets answered, someone has to answer some very unglamorous questions. The questions sound simple: What data do you actually have? Where does it live? Is it accurate? However, when most organizations go looking, they find data scattered across systems that don’t talk to each other, duplicated in ways nobody intended, and accurate enough for the humans who use it daily but nowhere near reliable enough to train a system that will make decisions at scale.

Once inventory is completed, decisions around what format and where the data should be stored will need to be made based on use cases. This is architecture work, not AI work.

Quality

Now that a data store or stores have been decided on, the next issue is data quality. Determining quality means first deciding on and building a quality measuring stick, then using that stick to figure out what data is wrong, inconsistent, or incomplete, and fixing it. What that measuring stick looks like depends on what results the team is looking for from the AI tools. Are the results supposed to give a prediction on sales per product line per quarter? Then the historical sales data needs to be complete, consistent, and tagged in a way the model can interpret.

Classification

Every piece of data needs to be understood in terms of what it is, how sensitive it is, and what rules govern it. Therefore, decisions regarding a classification system also have to be made before overlaying AI. Without classification, sensitive data (personal information, financial records, regulated data) can end up in training sets where it doesn’t belong. Once it’s in the model it’s nearly impossible to remove. That may not just be a technical problem depending on your industry. It may be a legal and regulatory one.

Governance

I’m sure you’ve heard the word before and your eyes have glazed over. However, have you actually looked into what data governance really is? Governance is what keeps the data secure, compliant, available, and clean. It encompasses the policies and controls that answer questions like: who owns it, who can access it, how it needs to be regulated, how the quality is monitored, and who is accountable to enforce such controls. Those are all important questions to ask and answer when getting AI-ready.

None of this is AI, but all of it is a prerequisite to AI. As CIO magazine points out, data quality issues continue to be one of the largest problems even with data management programs in place. Governance is what drives the quality policies and is therefore at the forefront of many companies’ data and AI readiness campaigns. Additionally, according to a 2026 IDC white paper, one in four enterprises cannot determine ROI from their AI investments, most often because the data foundation underneath was never properly established before the model was deployed.

The scope of this work has real staffing implications that most organizations underestimate. For startups, the second hire should be a seasoned data engineer or data architect. That person will be the most valuable in the organization if data and AI are core to the business plan. For enterprise companies, the volume of work involved in building or remediating a data foundation takes a village. Wading through legacy systems, creating order out of years of accumulated chaos, and building the governance structures that AI requires takes a team of seasoned architects, infrastructure engineers, DevOps, and data engineers who can think at the system level. Underestimating the resourcing this requires is one of the most common reasons AI initiatives never make it past the POC stage.

The Risks You’re Not Thinking About

The things mentioned as prerequisites to AI may seem like just a checklist of “to dos.” On the surface, yes, that is the truth of it, but if you don’t do these things, here is what potentially happens.

Hallucinations

Assuming you at least vaguely keep up with AI, you’ve heard of hallucinations as not just the monsters we see on the trail when we’re 15 hours into a 24 hour adventure race. Hallucinations are confident but false, misleading, or fabricated responses generated by large language models. They aren’t random AI weirdness. They’re when a model prioritizes pattern matching over factual correctness due to holes or inconsistencies in training data, biased data, or over optimization. Due to the nature of an LLM (how it works under the hood), there will probably always be certain circumstances when it hallucinates. However, using clean, complete, quality data goes a long way in reducing the chance of hallucinations, and that starts long before the model is ever trained. It starts with your data foundation.

How much could hallucinations cost you? It’s hard to measure the time lost chasing false information, but global financial losses tied to AI hallucinations hit $67.4 billion in 2024, with enterprises spending approximately $14,200 per employee annually on hallucination mitigation efforts (AllAboutAI, 2025). Remember, advances and learnings in AI are on a curve rising steeply to the right so these numbers are hopefully not as drastic today, however, even presently, there are major decisions being made based on AI hallucinations.

Security Exposure

AI systems that ingest enterprise data are a new and attractive attack surface. If your data governance is weak, you may not even know what sensitive information is flowing into and out of your AI system. Without a governed, classified data foundation, you have no way of knowing what your AI is ingesting, storing, or exposing. Employees paste proprietary data into public AI tools. Developers ship AI-powered applications without security review.

IBM’s 2025 Cost of a Data Breach report found that 13% of organizations have already experienced breaches of AI models or applications, and 97% of those lacked proper AI data access controls. The security implications of AI go deep, and for a closer look at the specific threat vectors, this overview is worth a read.

Regulatory and Legal Liability

Governments are catching up to AI faster than organizations are catching up to their data. The frameworks being written assume you already have governed, classified, and auditable data. If you don’t, you’re not just technically unprepared. You’re building regulatory liability into your AI program from day one.

The NIST AI RMF, which is currently voluntary but is rapidly becoming the de facto standard, has data governance baked into its foundation. It explicitly requires organizations to update existing data governance and data privacy policies, particularly around the use of sensitive data, as part of any AI governance program to meet the requirements of emerging AI laws. The EU AI Act is already in force, and we can expect laws in the US to follow suit. Organizations that wait for regulation to force the issue will find that retrofitting data governance under legal pressure is significantly more expensive than building it correctly from the start.

It Really Is an Iceberg

What you see when you look at a successful AI deployment is the surface: the interface, the output, the demo that impressed the c-suite. What you don’t see is everything that had to exist before any of that was possible. A governed data foundation with classified, quality-controlled, and auditable data had to be in place. Security controls and observability needed to exist to know what is flowing in and out of the systems. Regulatory readiness had to be implemented. That’s the iceberg. Most of it is underwater and none of it is glamorous.

The good news is you have a choice. You can build it now, deliberately, on your own timeline, with the right people and the right architecture. Or you can deploy AI anyway, hit the inevitable walls (hallucinations, a breach, a regulatory inquiry, an AI that confidently tells your customers the wrong thing), and then backpedal.

The question was never really about AI. It was always about the data underneath it.

A Structured Approach

If what you’ve read here sounds familiar, it’s because this is the work we do. Infosight helps organizations build the data foundations that AI actually requires: cloud data engineering, architecture, governance, classification, and the strategic planning that ties it all together. Whether you’re standing up a new data platform, remediating years of accumulated technical debt, or preparing for the regulatory landscape that’s already arriving, we bring the engineering depth and the systems-level thinking to get it done right the first time.

Explore our services →