DATA READINESS FOR AI

Data Readiness for AI in Manufacturing: A Checklist

By Jason Osajima — former VP of AI at a $250M manufacturer ·
Quick answer

Data readiness for AI in manufacturing — a practical checklist for COOs. What actually has to be true before an agent works, and what's a myth.

The phrase data readiness for AI gets used as a reason to delay. "We can't do agents until our data is clean." I've heard it in every plant I've walked. It's half right. Your data does need to clear a bar — but the bar is far lower and far more specific than the consultants telling you to spend two years on a data lake first would like you to believe. I ran AI at a $250M furniture manufacturer with data that was, charitably, a mess. We shipped anyway. Here's the actual checklist.

The core misunderstanding is treating data readiness as one giant binary state your whole company has to reach. It isn't. Readiness is per use case. The data needed for a supplier-document agent has nothing to do with the data needed for a demand-planning agent. You don't get your data ready. You get the data for one agent ready, ship it, then do the next.

The five-question readiness check

For any agent you're considering, run its data through these five questions. This is the whole framework. If a use case passes, build it. If it fails on a question, you know exactly what to fix instead of waving at "data quality" in the abstract.

  1. Accessible — Can software read this data without a human typing? (DB query, API, scheduled export.) If a person has to copy-paste it, the agent can't use it.
  2. Complete enough — Are the fields the agent needs actually populated, most of the time? Not perfect. Most of the time.
  3. Consistent — Is an entity represented the same way across records? One spelling per supplier, one format per part number, consistent units.
  4. Current — Is the data fresh enough for the decision? A planning agent tolerates yesterday's data. A live-status agent doesn't.
  5. Trustworthy — Do the people who'd use the agent already trust this data source? If ops doesn't believe the ERP's lead-time field today, an agent reading it won't change their mind.

Score each question 0-2 (no / partial / yes). A use case scoring 8+ out of 10 is ready to build. A 5-7 is buildable with scoping — narrow it to the clean subset. Below 5, fix the data first or pick a different agent.

What readiness does NOT require

Just as important is killing the myths that cause the two-year delay:

The instinct to fix everything first is how AI initiatives spend eighteen months and ship nothing.

Readiness by use case

Different agents have wildly different data demands. Here's the realistic picture for the common first agents:

Agent Data needed Typical readiness Why
Supplier-doc intelligence PDFs: specs, certs, datasheets, POs High Documents don't need clean structure — RAG handles unstructured text
Order-status / exception lookup ERP order + status tables Medium-high Usually accessible; depends on consistency
Ops-review prep ERP + BI exports Medium Needs joined data; tolerates batch latency
Demand / inventory Q&A Planning + inventory data Medium Needs consistency and currency to be trusted
Quality / defect analysis MES + inspection records Lower Often messy, free-text, inconsistent capture

Notice the pattern: document-heavy agents are the easiest to make ready because unstructured text doesn't need the clean schema structured data does. A supplier-doc agent works on the PDFs sitting in a SharePoint folder right now. That's why it's often the best first build — the data is already "ready" by definition.

Structured-data agents demand more on consistency and currency. Quality and defect agents tend to fail the consistency check because shop-floor capture is inconsistent and free-text-heavy. That doesn't mean never. It means later, after you've cleaned that domain.

Fix data inside the pipeline, not the source

When a use case scores partial on consistency or completeness, the wrong move is a project to clean the source system. That's slow and political. The right move is to handle it in the data pipeline feeding the agent:

This keeps cleanup scoped to what one agent needs, and it compounds — each agent you build leaves that data domain a little cleaner for the next.

The readiness sequence that works

Don't run a company-wide data-readiness program. Run this loop:

  1. Pick a candidate agent. Score its data on the five questions.
  2. If it scores 8+, build it. If 5-7, narrow scope to the clean subset and build that. If under 5, pick a different agent or fix the specific failing dimension.
  3. Ship, measure, clean the data the next agent will need while this one runs.

Start with a document-heavy agent — supplier-doc intelligence is the usual winner — because its data is ready today. Get a win on the board in 30 days. Use that momentum and the cleaned data to tackle the structured-data agents next.

Data readiness for AI is real, but it's a checklist you run per agent, not a destination you reach before starting.


Want to know which of your agents are data-ready right now? Grab the free First 5 Agents teardown — I'll score your top five candidate agents against the five-question check and tell you which to build first and which need a data fix. Then book a 20-minute call and we'll find the one agent your data can support this month.

Let's see what's worth building first.

A 15-minute call: tell me where your AI or planning is stuck, and I'll tell you the one thing worth building first — and whether it's worth doing at all.

More field notes

AgentOps: Monitoring AI Agents in ProductionAI Governance for Manufacturers: A Starter FrameworkAI Agent Security Risks Manufacturers Must ManageHuman-in-the-Loop AI for Operations: When to Use It